symerio / pgeocode

Postal code geocoding and distance calculation
https://pgeocode.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
231 stars 57 forks source link

incorrect centroid in query_postal_code for duplicate postal code entries #32

Closed sz-dar closed 1 year ago

sz-dar commented 4 years ago

query_postal_code sums the longitude.

nomi.query_postal_code("41-800")

Thats from GEOName file: 41-800 will return you 2 locations: PL, 41-800, 50.2817, 18.6745 PL,41-800, 50.3055, 18.778

After running: nomi.query_postal_code("41-800") postal_code 41-800 place_name Gliwice, Zabrze latitude 50.2817 longitude 18.7263

and the longitude = SUM of the locations from file / number of results.

rth commented 4 years ago

It doesn't sum, it compute the centroid (i.e. the mean position) for queries that have multiple maching postal code entries.

What would you like to happen in this case otherwise? If we don't group by postal code and average it means that a query of say 5 postal codes can return any number of entries from 5 to a few dozens, which will hurt usability.

What is your use-case where the current behavior is an issue?

sz-dar commented 4 years ago

I get that, however it only compute the centroid for the longitude, the latitude will stay as it was for the 1st record? If you compute the center, should you use both? latitude and longitude ?

Does it make sense?

rth commented 4 years ago

Yes, it should have been done for both latitude and longitude. I'll investigate, thanks.