nextstrain / nextclade_data

Datasets for https://github.com/nextstrain/nextclade
https://clades.nextstrain.org
31 stars 25 forks source link

ENH: G_clade GB1 for RSV-B? #61

Closed AngieHinrichs closed 10 months ago

AngieHinrichs commented 1 year ago

Thank you for adding datasets for RSV-A and RSV-B! I am updating my scripts to use nextclade instead of scraping clade assignments from nextstrain.org. There is a GB1 branch on nextstrain.org: https://nextstrain.org/rsv/b/genome?label=clade:GB1 However, nextclade did not assign G_clade GB1 to any of the expected sequences (e.g. KU316116, MG642053) and it looks like the rsv_b tree.json has no samples with G_clade = GB1.

Any chance of adding GB1 samples and labels to rsv_b in a future release?

corneliusroemer commented 1 year ago

Good question how this came about. I wasn't closely involved with the RSV datasets, so I'm not sure how sequences were chosen for inclusion. GB1 is the oldest clade so maybe it didn't make the cut, GB1 doesn't seem to have a G clade name.

Let's see what @LauraU123 or @rneher say

G1 is definitely not in the Nextclade reference tree, so cannot be assigned:

image

Here are G and whole genome clades:

image image

Note that G1 is grey in the whole genome clade coloring.

AngieHinrichs commented 1 year ago

Thanks @corneliusroemer!

For comparing current nextclade assignments with usher-tree-based assignments after labeling the tree using a combination of nextclade assignments with other sources, here is a tree of INSDC RSV-B sequences, built by usher using NC_001781.1 (= AF013254) as reference/root (because it's a RefSeq and that is more convenient in the UCSC Genome Browser framework), on which 5 types of clade assignment can be used for coloring:

rneher commented 1 year ago

Great to see you are also working on RSV, Angie. We should include more older sequences into the nextclade tree. We were hoping that there would soon be a more consolidated and continuously updated nomenclature -- but I haven't heard much in a while.

AngieHinrichs commented 10 months ago

I see GB1 is included now, thanks!