symerio / pgeocode

Postal code geocoding and distance calculation
https://pgeocode.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
231 stars 57 forks source link

Slow performance for Portugal (PT) #53

Open volkangumuskaya opened 2 years ago

volkangumuskaya commented 2 years ago

Hi, I am querying latitude and longitude from the postcodes of 26 countries. When I made a test run where there are 10 rows for each country, Portugal (PT) takes much longer to process. For reference, all other countries run in the range of 1-3 seconds, while for Portugal this ranges in 30-40 seconds.

I'm not experienced in Python much, so probably my coding is not the most efficient but it is odd that one country stands out (Suggestions to improve the code below is most welcome).

for country_code in np.unique(df.COUNTRY_VALUE):
    print(country_code,len(df[df.COUNTRY_VALUE==country_code]))
    start_time = time.time()
    nomi = pgeocode.Nominatim(country_code)
    df.loc[df.COUNTRY_VALUE==country_code,'LAT']=df.POSTCODE.apply(get_city).latitude
    df.loc[df.COUNTRY_VALUE == country_code, 'LON'] = df.POSTCODE.apply(get_city).latitude
    print("--- %s seconds ---" % (time.time() - start_time))
def get_city(code):
    try:
        x=nomi.query_postal_code(code)
        return x
    except:
        return ('')
volkangumuskaya commented 2 years ago

As an alternative solution, I downloaded the txt files and used them directly from this website. It is much faster but static. Website is http://download.geonames.org/export/zip/