pelias / docker

Run the Pelias geocoder in docker containers, including example projects.
MIT License
315 stars 218 forks source link

502 error not caught downloading open addresses files #267

Closed cromulus closed 2 years ago

cromulus commented 2 years ago

Describe the bug

[/data/openaddresses/us-ms-copiah2021815-1-1qzpf70.t2re.zip]
  End-of-central-directory signature not found.  Either this file is not
  a zipfile, or it constitutes one disk of a multi-part archive.  In the
  latter case the central directory and zipfile comment will be found on
  the last disk(s) of this archive.
unzip:  cannot find zipfile directory in one of /data/openaddresses/us-ms-copiah2021815-1-1qzpf70.t2re.zip or
        /data/openaddresses/us-ms-copiah2021815-1-1qzpf70.t2re.zip.zip, and cannot find /data/openaddresses/us-ms-copiah2021815-1-1qzpf70.t2re.zip.ZIP, period.

the downloaded file turns out to be html, a 502 error.

<html><head>
<title>502 Proxy Error</title>
</head><body>
<h1>Proxy Error</h1>
<p>The proxy server received an invalid
response from an upstream server.<br />
The proxy server could not handle the request <em><a href="/latest/run/us/ms/copiah.zip">GET&nbsp;/latest/run/us/ms/copiah.zip</a></em>.<p>
Reason: <strong>Error reading from remote server</strong></p></p>
<hr>

Steps to Reproduce

Run the North America example.

Expected behavior

expect error handling from upstream http errors. either rely after delay, or queue in the future, etc.

Environment (please complete the following information):

fresh ubuntu focal fossa, with docker 20.10.8 and compose version 1.25.5 Pastebin/Screenshots

Additional context

References

missinglink commented 2 years ago

Thanks for the bug report, these errors are intermittent and unfortunately hard to test. I've authored https://github.com/pelias/openaddresses/pull/488/files as a potential solution.

Can you please test it and let us know how you get on?

To simplify testing, our bot generated a docker image for that commit which you can use by editing docker-compose.yml to change services.openaddresses.image from pelias/openaddresses:master to pelias/openaddresses:curl_status_code-2021-09-16-9fae030ebafb2ca8dd9a716c53058e9c8700840e.

cromulus commented 2 years ago

Thanks you! I'll try it out!

How do you suggest dealing with the downloaded html files? Perhaps deleting the whole data/openaddresses folder?

missinglink commented 2 years ago

code has now been merged, the đŸ¤– will create a new copy of pelias/openaddresses:master in a minute or so which you should switch back to in order to track the latest changes.

missinglink commented 2 years ago

yea, you can blow away the openaddresses dir under your $DATA_DIR and rerun. I suspect it would 'just work' anyway if you didn't do that but better safe than sorry.

cromulus commented 2 years ago

Hmm... it looks like just running pelias download all overwrites pre-existing files.

Would be nice to have a no-clobber option or "if it exists and is valid, don't download it again" option. I sort of hate to have to download tens of GB unnecessarily.

cromulus commented 2 years ago

I'm also running what I think is the latest version of the open addresses docker image, and still getting the error.

[/data/openaddresses/ca-ns-province2021816-1-j0f60m.z54m.zip]
  End-of-central-directory signature not found.  Either this file is not
  a zipfile, or it constitutes one disk of a multi-part archive.  In the
  latter case the central directory and zipfile comment will be found on
  the last disk(s) of this archive.
unzip:  cannot find zipfile directory in one of /data/openaddresses/ca-ns-province2021816-1-j0f60m.z54m.zip or
        /data/openaddresses/ca-ns-province2021816-1-j0f60m.z54m.zip.zip, and cannot find /data/openaddresses/ca-ns-province2021816-1-j0f60m.z54m.zip.ZIP, period.
missinglink commented 2 years ago

Worth checking you're on the latest version.

It will show progress bars as it updates the images, so if you see the openaddresses one updating you weren't on the latest copy.

missinglink commented 2 years ago

The no-clobber option is an interesting idea but would need to be disabled when trying to update to a more recent version of the same file.

A 'proper solution' would be to store shasum hashes (or last_mod or whatever) and ask the server if the file needs to be updated or not.

It's unfortunately very difficult because it's not clear which files were provided by the zip file during the last extraction and they all unzip over each other into a shared directory structure.

We also don't keep the original compressed file around after extraction.

cromulus commented 2 years ago

Yeah, the servers are clobbered and constantly returning 503 errors. I guess I'll wait a bit and try to edit down the pelias.json from all of north-america to US only.

Is there any way to not be as resource intensive? for instance, the Australia project has a countrywide.csv does such a thing exist for other countries?