sars-cov-2-variants / lineage-proposals

Repository to propose and discuss lineages
42 stars 2 forks source link

Large usher artefactual branch attracting JN.1/2 seqs with no coverage around S:20 #1347

Closed aviczhl2 closed 6 months ago

aviczhl2 commented 7 months ago

There seems to be a large artefactual branch on BA.2.86.1 but actually being JN.1/2, when investigated I found it attracts seqs with no coverage around S:20.

@AngieHinrichs @yatisht usher image

FedeGueli commented 7 months ago

Thank you! it messes up also counts of sublineage with 346/572

AngieHinrichs commented 7 months ago

I will take a look, thanks. By the way, please don't tag Yatish in reports of bad branches -- he is a busy professor at UC San Diego now and while he and his group are the developers of UShER and matOptimize, they are not directly involved in the daily update of the UC Santa Cruz SARS-CoV-2 tree. These kinds of problems are caused by sequencing issues and the remedy is to prune sequences and/or mask sites that are error-prone. In general they are not bugs in the UShER software. It is applying maximum parsimony to real-world imperfect data. My job would be so easy if sequence data were perfect. 🙂

FedeGueli commented 7 months ago

I will take a look, thanks. By the way, please don't tag Yatish in reports of bad branches -- he is a busy professor at UC San Diego now and while he and his group are the developers of UShER and matOptimize, they are not directly involved in the daily update of the UC Santa Cruz SARS-CoV-2 tree. These kinds of problems are caused by sequencing issues and the remedy is to prune sequences and/or mask sites that are error-prone. In general they are not bugs in the UShER software. It is applying maximum parsimony to real-world imperfect data. My job would be so easy if sequence data were perfect. 🙂

AngieHinrichs commented 6 months ago

Some of those sequences have Ns from ~21610-21640 while others have a series of substitutions within the range 21610-21624 which I think are likely errors. 21610 was already masked in BA.2.86 by previous request. So I'm going to mask all sites in 21610-21624 on the BA.2.86 branch as of today's build (2024-01-30). Hopefully when it's done later today those sequences will be placed without the dubious mutations in that range.

FedeGueli commented 6 months ago

Thank you @AngieHinrichs !