Closed jbloom closed 1 year ago
Actually, maybe I found it? Does it use this mask which was also part of the Lanfear pipeline? https://github.com/W-L/ProblematicSites_SARS-CoV2/blob/master/subset_vcf/problematic_sites_sarsCov2.mask.vcf
Thanks again for helping me sort this out.
Yes, we mask the positions with recommendation mask
(not caution
) before placing new sequences in the tree. Sorry about the delayed reply.
@AngieHinrichs, I was wondering if your pipeline to generate the pre-built
UShER
tree has additional all-clade site masking in addition to the clade specific masking you mention in #312?In particular, I could find no counts of mutations at site C21575 in the translated MAT (C21575T causes spike L5F). But nextstrain indicates that mutation should be present at some frequency (see here).
If it's masked, I was just wondering if you could point me to the script in the pipeline that catalogues these additional masked sites.
Thanks again for all your help.