yatisht / usher

Ultrafast Sample Placement on Existing Trees
MIT License
121 stars 40 forks source link

masking of C21575T? #314

Closed jbloom closed 1 year ago

jbloom commented 1 year ago

@AngieHinrichs, I was wondering if your pipeline to generate the pre-built UShER tree has additional all-clade site masking in addition to the clade specific masking you mention in #312?

In particular, I could find no counts of mutations at site C21575 in the translated MAT (C21575T causes spike L5F). But nextstrain indicates that mutation should be present at some frequency (see here).

If it's masked, I was just wondering if you could point me to the script in the pipeline that catalogues these additional masked sites.

Thanks again for all your help.

jbloom commented 1 year ago

Actually, maybe I found it? Does it use this mask which was also part of the Lanfear pipeline? https://github.com/W-L/ProblematicSites_SARS-CoV2/blob/master/subset_vcf/problematic_sites_sarsCov2.mask.vcf

Thanks again for helping me sort this out.

AngieHinrichs commented 1 year ago

https://github.com/W-L/ProblematicSites_SARS-CoV2/blob/master/subset_vcf/problematic_sites_sarsCov2.mask.vcf

Yes, we mask the positions with recommendation mask (not caution) before placing new sequences in the tree. Sorry about the delayed reply.