yatisht / usher

Ultrafast Sample Placement on Existing Trees
MIT License
120 stars 40 forks source link

2 lineage issues #344

Open shay671 opened 1 year ago

shay671 commented 1 year ago

Hi

  1. In XBC.1.6 and its descendent lineages, position 28271 got T in Usher while all samples i checked for those variants had deletion in that position in their VCF

for instance https://nextstrain.org/fetch/genome.ucsc.edu/trash/ct/subtreeAuspice1_genome_2446f_35d1b0.json?c=gt-nuc_28271&label=id:node_6220110 (@RYHISNER had some insights regarding this)

2. In XBC.1.6.1 there is T in 14960 although only 12.9% of samples are with T while the rest has no mutation (or N) in that position

AngieHinrichs commented 1 year ago

Thanks for the reports.

  1. ... XBC.1.6 ... position 28271... deletion

UShER ignores deletions and only uses substitutions. Before adding a sequence to the tree, I mask deleted positions to 'N' -- in retrospect this was a bad idea and I should have been masking to reference allele all along because masking to N allows a falsely reported substitution to be imputed for all sequences that were masked to N because they had a correctly reported deletion.

Omicron has A28271T, and position 28271 was looking flaky in XBC, so I am masking out reversions at 28271 in XBC in branchSpecificMask.yml -- which causes the false appearance of A28271T in XBC, sorry about that. I will add that to the list of things to try to fix. (The tree will never show that it has a deletion, but at least we should be able to remove the A28271T.)

  1. In XBC.1.6.1 there is T in 14960 although only 12.9% of samples are with T

Thanks, I will try to fix that too.