pelias / openaddresses

Pelias import pipeline for OpenAddresses.
MIT License
51 stars 43 forks source link

Openaddresses Download sources no longer available #491

Closed awdng closed 2 years ago

awdng commented 2 years ago

The npm download command will try to fetch data from https://results.openaddresses.io/latest/run which appears to not be available anymore. The site itself returns a 503 and the importer fails the download

2021-11-11T14:06:49.327Z - debug: [openaddresses-download] downloading https://results.openaddresses.io/latest/run/us/ny/city_of_new_york.zip
2021-11-11T14:06:49.329Z - debug: [openaddresses-download] curl --request GET --silent --location --fail --write-out "%{http_code}" --referer https://pelias-results.openaddresses.io --output /Users/a.wieding/workspace/pelias/openaddresses/data/us-ny-city_of_new_york20211011-52189-5tneb.kkgzyw.zip --retry 5 --retry-connrefused --retry-delay 5 https://results.openaddresses.io/latest/run/us/ny/city_of_new_york.zip
2021-11-11T14:07:22.000Z - warn: [openaddresses-download] failed to download https://results.openaddresses.io/latest/run/us/ny/city_of_new_york.zip: Error: cURL request failed, HTTP status: 500, exit code: 22
2021-11-11T14:07:22.003Z - info: [openaddresses-download] All done!
missinglink commented 2 years ago

I'm afraid we don't control those servers, I hope it's an intermittent issue.

missinglink commented 2 years ago
curl --head https://results.openaddresses.io/latest/run/us/ny/city_of_new_york.zip
HTTP/1.1 503 Service Unavailable: Back-end server is at capacity
Connection: keep-alive
missinglink commented 2 years ago

I tried downloading the files manually and got this message:

Screenshot 2021-11-11 at 20 02 11

It might be that the OA team have made the CSV downloads no longer available without registration, this would mean that Pelias users wont be able to download in that format any longer and we need to do some work to fix it.

There has already been some work to support the new GeoJSON format in https://github.com/pelias/openaddresses/pull/476 but we would need to test that and ensure we have code to detect the latest run, then migrate from CSV downloads to GeoJSON downloads.

I've reached out to the OA team for clarification

missinglink commented 2 years ago

The OA team replied that this is an intermittent issue returning 503 errors and should resolve itself.

However, the results.openaddresses.io server is old and has been replaced by batch.openaddresses.io some time ago now, we should prioritise a migration so as to allow that team to decommission the old results.openaddresses.io server.

awdng commented 2 years ago

Thanks for getting back on this! The site is now back but with a visible disclaimer that the data is not updated anymore and the site itself will probably also disappear at some point. As far as the OA importing pipeline, imo something like this would be needed:

missinglink commented 2 years ago

Yeah, ok cool, Ian basically said that he wants that results. server to go away since its a pain to maintain and has been superseded by batch.

I still need to get a little more clarification from OA on our workflow.

To your first point, my ideal situation would be to distribute Pelias code with default shared credentials, these credentials would come with no guarantees, possibly rate-limited and would be prone to abuse.

We would then make it clear in the README that everyone should get their own credentials and put them in their own pelias.json which would give them individual limits.

I like this idea because it would 'just work' out of the box but anyone serious about their installation would register their own key. I need to talk to Ingalls about that.

To your second point, I believe https://github.com/pelias/openaddresses/pull/476 can do this already?

missinglink commented 2 years ago

@awdng I believe we can close this issue now as the intermittent failure you reported has now been resolved:

curl --head https://results.openaddresses.io/latest/run/us/ny/city_of_new_york.zip
HTTP/2 301

I had a discussion with the OA team and we have a path to move forward, I'll look at opening some PRs to migrate to the new domain, the first of which being https://github.com/pelias/config/pull/137

awdng commented 2 years ago

Agreed, thanks for the quick response and follow ups on this! I will keep an eye on the PR you mentioned šŸ‘šŸ»