nextstrain / ncov-ingest

A pipeline that ingests SARS-CoV-2 (i.e. nCoV) data from GISAID and Genbank, transforms it, stores it on S3, and triggers Nextstrain nCoV rebuilds.
MIT License
35 stars 20 forks source link

Make use of genbank's `purpose_of_sampling` field #415

Open corneliusroemer opened 11 months ago

corneliusroemer commented 11 months ago

Context

It's possible to get the purpose of sampling from genbank via datasets summary virus genome taxon sars-cov-2, see https://github.com/GenSpectrum/LAPIS/issues/328#issuecomment-1673639689

It would be nice if we parsed that field into our metadata.tsv so one can filter for baseline vs airport surveillance, for example.