nextstrain / ncov-ingest

A pipeline that ingests SARS-CoV-2 (i.e. nCoV) data from GISAID and Genbank, transforms it, stores it on S3, and triggers Nextstrain nCoV rebuilds.
MIT License
35 stars 20 forks source link

Add Genbase sequences to open data #402

Open corneliusroemer opened 1 year ago

corneliusroemer commented 1 year ago

Most Chinese sequences are now open, this should be a useful addition to open data

Usher now pulls these, it seems to work. We can probably copy the approach, see https://github.com/yatisht/usher/issues/337#issuecomment-1506285284

AngieHinrichs commented 1 year ago

In case it's of any use, here's my script that fetches all metadata and uses GenBase's API to download (one at a time unfortunately) the sequences that I don't already have: https://github.com/ucscGenomeBrowser/kent/blob/master/src/hg/utils/otto/sarscov2phylo/getCncb.sh