symerio / pgeocode

Postal code geocoding and distance calculation
https://pgeocode.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
231 stars 57 forks source link

Pgeocode throws HTTP Error 404: Not Found #43

Closed mobuchowski closed 3 years ago

mobuchowski commented 4 years ago

AFAIK it's the same issue as #40

cnpryer commented 4 years ago

Is it? I'm able to run this on a windows build but not in my linux environment.

Successful run:

Failed runs:

Related Traceback Snippet:

app/geocode.py:17: in geocode_zipcodes
    nomi = pgeocode.Nominatim(country)
/usr/local/lib/python3.8/site-packages/pgeocode.py:71: in __init__
    self._data_path, self._data = self._get_data(country)
/usr/local/lib/python3.8/site-packages/pgeocode.py:89: in _get_data
    reader, headers = _get_url(url)
/usr/local/lib/python3.8/site-packages/pgeocode.py:40: in _get_url
    res = urllib.request.urlopen(url)
/usr/local/lib/python3.8/urllib/request.py:222: in urlopen
    return opener.open(url, data, timeout)
/usr/local/lib/python3.8/urllib/request.py:531: in open
    response = meth(req, response)
/usr/local/lib/python3.8/urllib/request.py:640: in http_response
    response = self.parent.error(
/usr/local/lib/python3.8/urllib/request.py:569: in error
    return self._call_chain(*args)
/usr/local/lib/python3.8/urllib/request.py:502: in _call_chain
    result = func(*args)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <urllib.request.HTTPDefaultErrorHandler object at 0x7f6634eb0b20>
req = <urllib.request.Request object at 0x7f6634eb0b80>
fp = <http.client.HTTPResponse object at 0x7f6634eb01f0>, code = 404
msg = 'Not Found', hdrs = <http.client.HTTPMessage object at 0x7f6634eb0130>

    def http_error_default(self, req, fp, code, msg, hdrs):
>       raise HTTPError(req.full_url, code, msg, hdrs, fp)
E       urllib.error.HTTPError: HTTP Error 404: Not Found

/usr/local/lib/python3.8/urllib/request.py:649: HTTPError

A quick skim of #40 makes me think that issue was related to the Geonames service availability, but maybe I'm misunderstanding.

cnpryer commented 4 years ago

Maybe I did misunderstand. Reviewing pgeocode I see that it may be implementing a type of caching using a .txt dump on my local machine -- which would explain the success vs failures.

pgeocode.py

@staticmethod
def _get_data(country):
    """Load the data from disk; otherwise download and save it"""
    from zipfile import ZipFile

    data_path = os.path.join(STORAGE_DIR, country.upper() + ".txt")
    if os.path.exists(data_path):
        data = pd.read_csv(data_path, dtype={"postal_code": str})
rth commented 4 years ago

Yes, it's about Geonames service availability, and it would only happen the first run before the dataset is cached locally.

We really need to provide a fallback location for the Geonames data, see https://github.com/symerio/pgeocode/issues/41

samuhepp commented 4 years ago

This seems to be happening fairly regularly to me as well. Is it because the datasets are being updated?

rth commented 3 years ago

Is it because the datasets are being updated?

It is. They seem to remove old files before generating new ones leading to 404 errors.

See https://github.com/symerio/pgeocode/issues/44#issuecomment-715350761 for a solution that should happen in the near future. I'll close this issue in favor of https://github.com/symerio/pgeocode/issues/44 to avoid duplicates.