nextstrain / nextclade

Viral genome alignment, mutation calling, clade assignment, quality checks and phylogenetic placement
https://clades.nextstrain.org
MIT License
219 stars 61 forks source link

List of mutational changes per clade #1515

Open huzuner opened 3 months ago

huzuner commented 3 months ago

Hi,

Do you provide any resource for the list of mutational changes per clade?

In the dataset that is acquired with Nextclade CLI, nextstrain/sars-cov-2/wuhan-hu-1, there is this statement in README.md:

We define each clade by a combination of signature mutations. You can find the exact clade definition on Github in this file.

However, those are not the list of all mutations, but only a subset that's used for augur clades. see: https://github.com/nextstrain/ncov/issues/1107

Is this also same for the tree.json that's provided in the nextstrain dataset?

I checked and compared mutations of clades.tsv with tree.json, they are different.

If tree.json contains all mutations for all clades in that dataset, in what way it is organized? This is unfortunately not very clear anywhere in the documentation.

Thank you

rneher commented 3 months ago

CoVariants provides complete lists of mutations:

https://covariants.org/variants/24B.Omicron

another resource would be the pango-lineage consensus sequences maintained by @corneliusroemer

https://github.com/corneliusroemer/pango-sequences

The tree.json contains all mutations and you can accumulate those along any path in the tree.

hope this helps, richard

huzuner commented 3 months ago

This has been very helpful for me, thank you very much for your response! :)