nextstrain / ncov-ingest

A pipeline that ingests SARS-CoV-2 (i.e. nCoV) data from GISAID and Genbank, transforms it, stores it on S3, and triggers Nextstrain nCoV rebuilds.
MIT License
36 stars 20 forks source link

Update to NCBI Dataset v2 API #477

Closed joverlee521 closed 1 month ago

joverlee521 commented 1 month ago

NCBI Datasets announced v1 API has been deprecated and will no longer be available December 2024.

We are still using the v1 API in fetch-from-biosample. This should get updated to use the v2 API:

curl -o biosample.zip -X GET "https://api.ncbi.nlm.nih.gov/datasets/v2alpha/virus/taxon/1335626/genome/download?aux_report=BIOSAMPLE_REPORT" -H "accept: application/zip"

Thanks to @olearyna for the heads up 🙏

joverlee521 commented 1 month ago

Oh oops, copied the wrong taxon id in the URL, should be

https://api.ncbi.nlm.nih.gov/datasets/v2alpha/virus/taxon/2697049/genome/download?aux_report=BIOSAMPLE_REPORT

Noting v1 included ncbi_dataset/data/biosample.jsonl in the ZIP archive, while v2 includes ncbi_dataset/data/biosample_report.jsonl in the ZIP archive.

joverlee521 commented 1 month ago

Actually, since I'm updating the fetch-from-biosample, I might as well revisit https://github.com/nextstrain/ncov-ingest/issues/420.