openaddresses / batch

OpenAddresses/Machine based AWS Batch based ETL Processing
https://batch.openaddresses.io/
MIT License
6 stars 5 forks source link

provide HTTP URL in data list response #249

Closed missinglink closed 2 years ago

missinglink commented 2 years ago

Hi @ingalls would you consider making this change to the data listings?

The major benefit for the consumer is we wouldn't need to generate the HTTP URLs manually from the job param and would also be beneficial for the server-side as the download URL format could change at a later date from the https://batch.openaddresses.io/api/job/${job}/output/source.geojson.gz pattern without breaking integrations.

Just floating the idea for now, I could clean this up and edit the docs etc. if it's something you'd consider merging?

Am I right in saying this URL is currently just a convention rather than being something the API provides explicitely?

missinglink commented 2 years ago

agh I see it's a documented API here: https://batch.openaddresses.io/docs/#api-Job-SingleOutputData I guess in that case I could just hard-code the pattern into our codebase based on that definition.

missinglink commented 2 years ago

couple more questions sorry, the download endpoint seems to require authentication here: https://github.com/openaddresses/batch/blob/master/api/routes/job.js#L244

but the docs say it doesn't require authentication:

Screenshot 2021-12-01 at 13 46 03

Looking through that code I also see this other path /job/:job/output/validated.geojson.gz which is for sponsors. I guess I should check for output.validated == true and in that case use the different download URL?

This validated stuff is probably still wet paint, I think I'm possibly jumping the gun with some of this integration?

ingalls commented 2 years ago

Hey @missinglink, sorry for the late response, super busy month, just getting back to OA stuff.

Yup this is a documented API so I've omitted the response simply because this URL won't change. The S3 urls given to sponsors howeve are not necessarily static and as such any patterns/conventions are not guaranteed. These URLs should be obtained from their corresponding API.

Since this is a documented API I'm going to close this for now. Give me a shout if you run into any more implementation issues :)