symerio / pgeocode

Postal code geocoding and distance calculation
https://pgeocode.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
231 stars 57 forks source link

pgeocode: query_postal_code function #10

Closed BryanChuinkam closed 5 years ago

BryanChuinkam commented 5 years ago

Regarding the query_postal_code function: As seen on the code below. The actual postal code i'm searching for is 'K2C' but in-order to search for it i had to insert '5CA' in front of it. My understanding is the 5 - represents the accuracy and CA - is the Country code. Why do these first three characters need to be added? Am i right in what they represent?

nomi = pgeocode.Nominatim('CA') nomi.query_postal_code("5CA K2C")

thanks

rth commented 5 years ago

@BryanChuinkam thanks for the report.

If one looks at the GeoNames postal codes data for Canada (see the CA.zip) it does only contain 3 letter postal codes. However in the pgeocode implementation https://github.com/symerio/pgeocode/blob/5ae2fcfc8a2a796a9854e01aba512d5f7b60a78f/pgeocode.py#L125

the postal code for Canada needs to be provided under the form "XXX YYY" and the first part will be discarded. I don't remeber why this was added, but I imagine it aimed to to match the wikipedia defintion https://en.wikipedia.org/wiki/Postal_codes_in_Canada#Components_of_a_postal_code .

In your experience Canadian postal codes are mostly 3 letters/digits then? We could change the way this is handled.

admiralmaggie commented 5 years ago

@rth I think that line above need to change so only the first part of postal codes are considered.

https://github.com/symerio/pgeocode/blob/5ae2fcfc8a2a796a9854e01aba512d5f7b60a78f/pgeocode.py#L125

For example, M4B 1B3 is a postal code in Toronto, Ontario. 1B3 is not a valid code but M4B is.

musicpiano commented 5 years ago

@rth I think that line above need to change so only the first part of postal codes are considered.

https://github.com/symerio/pgeocode/blob/5ae2fcfc8a2a796a9854e01aba512d5f7b60a78f/pgeocode.py#L125

For example, M4B 1B3 is a postal code in Toronto, Ontario. 1B3 is not a valid code but M4B is.

You are right. We should change codes['postal_code'] = codes.postal_code.str.split().str.get(1) to codes['postal_code'] = codes.postal_code.str.split().str.get(0)

rth commented 5 years ago

Thanks @musicpiano ! Would you like to make a pull request?

rth commented 5 years ago

The version 0.1.2 with a fix for this issue should now be available on PyPi.