jeromekelleher / sc2ts

Infer a succinct tree sequence from SARS-COV-2 variation data
MIT License
4 stars 3 forks source link

Different sources of Pango lineage status #129

Open szhan opened 1 year ago

szhan commented 1 year ago

In our preprint ARGs, the Pango lineage assignments of the sample sequences was produced by the NextClade tool. However, the consensus-based mutation definitions from the COVIDCG website, which were used for imputing the non-sample nodes, were based on Pango lineage assignments produced by the Pangolin method. The NextClade team has reported discrepant Pango lineage assignments between the two methods (see this page). This may affect how we filter out putative recombinants by the Pango parent lineage consistency criterion. We will probably want to use Pango lineage assignments from the same source/method next time.

szhan commented 1 year ago

It seems that if we want accurate and stable Pango lineage assignments, then getting them from Pangolin (UShER mode) should be the way to go (again, see page). "Due to accuracy limitations, Nextclade’s pango classifier should not be used as a replacement of UShER or pangoLEARN, but rather as a convenient and transparent add on for current users of Nextclade."