yatisht / usher

Ultrafast Sample Placement on Existing Trees
MIT License
120 stars 40 forks source link

several issues #348

Open shay671 opened 11 months ago

shay671 commented 11 months ago

FL.4.4 All samples lacking 22995A and in the phylogeny it seems to be there https://nextstrain.org/fetch/genome.ucsc.edu/trash/ct/subtreeAuspice1_genome_6dbf_51d490.json?c=gt-nuc_22995&label=id:node_6727233

FU.2.1 241 has T in the tree while 94% of the samples got A (which I think is weird as this is B.1 defining mutation) https://nextstrain.org/fetch/genome.ucsc.edu/trash/ct/subtreeAuspice1_genome_17bcf_5265d0.json?c=gt-nuc_241&label=id:node_6634300

XBB.1.16.8 There seems to be reversion not recognized in the tree. the tree shows 19326 with G while only 5.5% of the samples has this mutation (1.7% has N In this position). https://nextstrain.org/fetch/genome.ucsc.edu/trash/ct/subtreeAuspice1_genome_2135b_58cdf0.json?c=gt-nuc_19326&label=id:node_6623401

FL.10.1 The path to this variant include 21998A although one step after the defining branchpoint this mutations is reversed (and that’s the main branch of the variant) and indeed only 2.9% of the samples have that mutation in their VCF (less than 1% has N). https://nextstrain.org/fetch/genome.ucsc.edu/trash/ct/subtreeAuspice1_genome_1b9c2_7995a0.json?label=id:node_6713858

AngieHinrichs commented 11 months ago

Thanks Shay -- the first 3 are consequences of branch-specific masking. The reversion A22995C is masked in BA.2.75, BA.5 and XBB because reversions on that base were too noisy. 241 (any change) is masked in everything from BA.2 onward -- that's interesting that it has mutated again C > T > A. In retrospect it probably would have been better to mask only the reversion T241C instead of all 241 changes, but... maybe too late now. Masking G19326A in all of XBB based on early observations was probably also a bad idea.

I'm finding that 21998 is super-messy all across XBB, so I might have to mask that too.