Closed mobuchowski closed 3 years ago
Is it? I'm able to run this on a windows build but not in my linux environment.
Successful run:
Failed runs:
Related Traceback Snippet:
app/geocode.py:17: in geocode_zipcodes
nomi = pgeocode.Nominatim(country)
/usr/local/lib/python3.8/site-packages/pgeocode.py:71: in __init__
self._data_path, self._data = self._get_data(country)
/usr/local/lib/python3.8/site-packages/pgeocode.py:89: in _get_data
reader, headers = _get_url(url)
/usr/local/lib/python3.8/site-packages/pgeocode.py:40: in _get_url
res = urllib.request.urlopen(url)
/usr/local/lib/python3.8/urllib/request.py:222: in urlopen
return opener.open(url, data, timeout)
/usr/local/lib/python3.8/urllib/request.py:531: in open
response = meth(req, response)
/usr/local/lib/python3.8/urllib/request.py:640: in http_response
response = self.parent.error(
/usr/local/lib/python3.8/urllib/request.py:569: in error
return self._call_chain(*args)
/usr/local/lib/python3.8/urllib/request.py:502: in _call_chain
result = func(*args)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <urllib.request.HTTPDefaultErrorHandler object at 0x7f6634eb0b20>
req = <urllib.request.Request object at 0x7f6634eb0b80>
fp = <http.client.HTTPResponse object at 0x7f6634eb01f0>, code = 404
msg = 'Not Found', hdrs = <http.client.HTTPMessage object at 0x7f6634eb0130>
def http_error_default(self, req, fp, code, msg, hdrs):
> raise HTTPError(req.full_url, code, msg, hdrs, fp)
E urllib.error.HTTPError: HTTP Error 404: Not Found
/usr/local/lib/python3.8/urllib/request.py:649: HTTPError
A quick skim of #40 makes me think that issue was related to the Geonames service availability, but maybe I'm misunderstanding.
Maybe I did misunderstand. Reviewing pgeocode
I see that it may be implementing a type of caching using a .txt dump on my local machine -- which would explain the success vs failures.
pgeocode.py
@staticmethod
def _get_data(country):
"""Load the data from disk; otherwise download and save it"""
from zipfile import ZipFile
data_path = os.path.join(STORAGE_DIR, country.upper() + ".txt")
if os.path.exists(data_path):
data = pd.read_csv(data_path, dtype={"postal_code": str})
Yes, it's about Geonames service availability, and it would only happen the first run before the dataset is cached locally.
We really need to provide a fallback location for the Geonames data, see https://github.com/symerio/pgeocode/issues/41
This seems to be happening fairly regularly to me as well. Is it because the datasets are being updated?
Is it because the datasets are being updated?
It is. They seem to remove old files before generating new ones leading to 404 errors.
See https://github.com/symerio/pgeocode/issues/44#issuecomment-715350761 for a solution that should happen in the near future. I'll close this issue in favor of https://github.com/symerio/pgeocode/issues/44 to avoid duplicates.
AFAIK it's the same issue as #40