symerio / pgeocode

Postal code geocoding and distance calculation
https://pgeocode.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
238 stars 58 forks source link

Throwing HTTP 404 error #40

Closed spawn07 closed 4 years ago

spawn07 commented 4 years ago

https://download.geonames.org/export/zip/{country}.zip , this link is throwing 404 error.

rth commented 4 years ago

Thanks @spawn07 , yes I can confirm. Maybe it's has something to do with how this data is updated by GeoNames. We had such issues before https://github.com/symerio/pgeocode/pull/34#issuecomment-615844144 that were resolved by themselves after some time.

We should probably also cache the data to some other location not to rely on Geonames service availability.

richunger commented 4 years ago

That dir only has allCountries.zip right now, not the individual countries.

richunger commented 4 years ago

http://download.geonames.org/export/dump/ has individual country zips.

richunger commented 4 years ago
diff --git a/pgeocode.py b/pgeocode.py
index 1895437..15f00cd 100644
--- a/pgeocode.py
+++ b/pgeocode.py
@@ -16,7 +16,7 @@ STORAGE_DIR = os.environ.get(
     "PGEOCODE_DATA_DIR", os.path.join(os.path.expanduser("~"), "pgeocode_data")
 )

-DOWNLOAD_URL = "https://download.geonames.org/export/zip/{country}.zip"
+DOWNLOAD_URL = "https://download.geonames.org/export/dump/{country}.zip"

 DATA_FIELDS = [
     "country_code",
@@ -174,7 +174,7 @@ class Nominatim:
         if os.path.exists(data_path):
             data = pd.read_csv(data_path, dtype={"postal_code": str})
         else:
-            url = DOWNLOAD_URL.format(country=country)
+            url = DOWNLOAD_URL.format(country=country.upper())
spawn07 commented 4 years ago

Thanks for the quick response @richunger .

purusmahe commented 4 years ago

Was the change merged ?

purusmahe commented 4 years ago

Never mind - https://download.geonames.org/export/zip/{country}.zip seems to be working again. Makes me wonder if this is an expected behavior due to availability issues with Geonames due to a update or scheduled maintenance like @rth mentions. Probably we should fall back to /dump whenever /zip throws a 404 ?

rth commented 4 years ago

Yes, it really sounds like a batch job at Geonames.org that first removes these files and then add the updated data for postal codes. https://github.com/symerio/pgeocode/issues/41#issuecomment-625963646 is a way to address this long term. Would anyone be interested in looking into it?