jeromekelleher / sc2ts

Infer a succinct tree sequence from SARS-COV-2 variation data
MIT License
5 stars 3 forks source link

Should we include site 241 #400

Open hyanwong opened 3 weeks ago

hyanwong commented 3 weeks ago

Site 241 is currently excluded, but 241T is a "defining" mutation for B.1 lineages, and therefore for a large number of samples high up in the tree. If it is a borderline-site for exclusion, I reckon we should probably include it. I opened this issue just to track our rationale for including / excluding that particular site

jeromekelleher commented 3 weeks ago

ORF1ab starts at 266, so this is in the "extra-genic" flanks that we're excluding unconditionally.

@szhan - this is your call, that do you think?

jeromekelleher commented 2 weeks ago

I've no problem including stuff outside the genes, but we need to make a decision on this pretty quickly. @szhan - what's your thoughts? If these are phylogenetically useful, then there's not much justification for excluding them?

szhan commented 2 weeks ago

Hmm, we are excluding the 5' and 3' UTRs.

szhan commented 2 weeks ago

As discussed earlier with @jeromekelleher, we are going to redo the run including the UTR sites, and use the resulting ARG to identify problematic sites.