nextstrain / nextclade

Viral genome alignment, mutation calling, clade assignment, quality checks and phylogenetic placement
https://clades.nextstrain.org
MIT License
218 stars 58 forks source link

Del S:Y144 reported BUT wrong clade assignation #311

Closed dbrandtner closed 2 years ago

dbrandtner commented 3 years ago

Dear Nextclade team, S:Y144 in v0.12.0 is now reported but is not assigned to the right clade.

We are working on a short sequence spike protein fragment that carries del 69-70 and 144, but no SNPs mutations. This is assigned to 19A instead 20B/501.v1.

If we enclose two mutations nearby in a larger segment it is assigned correctly to 20B/501.v1.

I think there is still a sort of bug in the way deletion alone is used to make clade assignation. For make your test and fix easier we provide here below the sequence missassigned that we checked on GISAID its correct belonging to 20B/501.v1.

Thank you

pcr_973 TTTCTTTTCCAATGTTACTTGGTTCCATGCTATCTCTGGGACCAATGGTACTAAGAGGTTTGATAACCCTGTCCTACCATTTAATGATGGTGTTTATTTTGCTTCCACTGAGAAGTCTAACATAATAAGAGGCTGGATTTTTGGTACTACTTTAGATTCGAAGACCCAGTCCCTACTTATTGTTAATAACGCTACTAATGTTGTTATTAAAGTCTGTGAATTTCAATTTTGTAATGATCCATTTTTGGGTGTTTACCACAAAAACAACAAAAGTTGGATGGAAAGTGAGTTCA

rneher commented 3 years ago

yes. but this is a deeper problem. We only use mutations for phylogenetic placement and clade assignment, not gaps. So if the sequence includes the gaps but doesn't include the clade defining mutations, it will not be placed correctly. We'll add a note, but I don't think we will be able to fix this in short order.