Docs on how to get data out of S3

maelle commented 4 years ago

Once one has the URL, what does one do with it? A bit more details would be nice, for the API docs, and maybe in a tech note.

sckott commented 4 years ago

where should I add that - since you're working on docs and old docs will be out of date, just put it here for you to add in your docs site?

maelle commented 4 years ago

Yes that makes sense

sckott commented 4 years ago

will do

sckott commented 4 years ago

some notes towards good docs for /history/:date route @maelle

/history/:date route

The /history/:date route allows GET requests only. The route is intended for fetching compressed new-line delimited JSON for an individual date, where all CRAN checks data across all packages is combined.

:date should be of the form YYYY-MM-DD

A request to /history/:date leads to a redirect (http status 302) and a returned JSON body with a message telling the user to follow the link in the Location response header in case they aren't familiar with redirects. The link to follow is a temporary Amazon S3 link to the JSON file for the given date.

One can automatically get the link to the JSON file by following the redirect. You can do this in curl with the -L flag, or in R by using the followlocation curl option like followlocation=1.

An important note is the data in the JSON file is NOT valid JSON as an entire entity. Each line of the file IS valid JSON; called newline delimited JSON (NDJSON; see http://ndjson.org/). You don't have to worry about these details if you use cchecks::cch_history(), which takes care of downloading the file and reading in the compressed NDJSON. On the command, you can do e.g, download the file, save to a .json.gz gzip-compressed file extension, then in the next line decompress the file with gzip, then pipe to jq, and use head to get the first 10 lines

curl -vL https://cranchecks.info/history/2020-04-01 > 2020-04-01.json.gz
gzip -dc 2020-04-01.json.gz | jq . | head -n 10

maelle commented 4 years ago

see https://cran-checks-docs.netlify.app/#history

I think the issue can be closed :-)

sckott commented 4 years ago

thanks!

sckott / cchecksapi

Docs on how to get data out of S3 #62

/history/:date route