[ingest] Add strain name via entrez

nextstrain / oropouche

Oropouche Nextstrain build

https://nextstrain.org/staging/oropouche/L

1 stars 0 forks source link

[ingest] Add strain name via entrez #3

Closed jameshadfield closed 2 months ago

jameshadfield commented 2 months ago

Queries entrez to find the strain name via accession lookups. Parsing strain names and collecting identical names together resulted in the following sets:

        n=1 149 times
        n=2 17 times
        n=3 467 times
        n=4 13 times
        n=6 1 times

indicating that this may be able to group together segments in most cases. I didn't link these data up with the phylo side of the workflow, but "strain" is now available in the ingest/results/**/metadata.tsv TSVs.

miparedes commented 2 months ago

Just tested it out and it works great! thanks so much @jameshadfield! I'll merge this in and work on adding it in to the phylo side.