Open jeromekelleher opened 1 year ago
Ah! I haven't got around to looking at the mutations yet: currently focussing on the trees, but yes, a good thing to know about and fix.
Note that this is actually getting the NextClade tree which is in JSON format (not the NextStrain tree which is in Nexus format, and downloadable from the link at the bottom of the nextstrain pages, rather than via a URL). The NextClade tree doesn't have branch lengths, so we decided not to use it for the time being. But the NextStrain tree doesn't have mutations of any sort on it. This is probably disallowed by GISAID anyway. Here's some details for how to actually get that data, if we want it:
https://discussion.nextstrain.org/t/sars-cov-2-mutation-data/78/3
I think we don't need it for the preprint, though.
The current nextstrain conversion script converts the "nuc" mutations, and ignores the rest. I thought that this was helpfully giving the mutations both in gene and nucleotide format - but I think now that it's just for the intergenic mutations.
@hyanwong - this would make the comparison in terms of mutations pretty meaningless, unless we also do the same thing (i.e., get the gene mutations as well - I'm working on this)