openaddresses / batch

OpenAddresses/Machine based AWS Batch based ETL Processing
https://batch.openaddresses.io/
MIT License
6 stars 5 forks source link

Share-alike collection(s)? #239

Open missinglink opened 2 years ago

missinglink commented 2 years ago

Is your feature request related to a problem? Please describe.

I wasn't able to find a 'collection' in batch equivalent to the openaddr-collected-global-sa.zip (share-alike) file from the results.openaddresses.io domain.

https://batch.openaddresses.io/api/collections

Describe the solution you'd like

Ideally I'd like to have a regularly built 'collection' which mirrors the old share-alike (-sa) behaviour. This would make migrating easier since we wouldn't have to make any breaking changes.

Describe alternatives you've considered

  1. Downloading this one file from the old results.openaddresses.io server (not ideal as I'm trying to get Pelias off that).
  2. Open this issue and see what can be done about creating a collection for it 😝

Additional context

Are these 'share-alike' files still a thing or did they go away at some point?

missinglink commented 2 years ago

Picking this up again today we noticed that the collection-global.zip is significantly larger than the previous version.

Our archived copy from results.openaddresses.io -> 12.7 GiB openaddr-collected-global.zip The latest copy I downloaded from batch.openaddresses.io -> 23G collection-global.zip

The new file is twice the size 😿, looking inside I believe its because it contains non-address datasets.

For Geocode Earth specifically we'd prefer to download a collection with only the validated addresses datasets, although it's not really a big deal to download more and filter it, as long as there's a way of determining what's validated and what's not.