jbloomlab / SARS2-mut-fitness

Observed substitution counts of SARS-CoV-2 compared to those expected under the mutation rates
MIT License
19 stars 5 forks source link

check T95I mutation in Delta clades #4

Closed jbloom closed 1 year ago

jbloom commented 1 year ago

@kbfeldmann notes the following. Figure out what is going on:

I have a question with regards to not observing any T95I mutation counts for 21J. I looked through the results folder and could not find any indication that 21J sequences after the clade founder had the T95I mutation. However, when I color the Nextstrain clade by genotype for site 95 in the spike protein, many sequences have the T95I mutation. It looks like both the Nextstrain tree and the observed/expected dataset have no T95I mutations for 21I. Do you know what may be causing this discrepancy in the data? My understanding is that samples are ordered into a phylogenetic tree and then the tree is annotated by mutation, so even if all T95I mutations revert back to the clade founder (T), the dataset would still identify these mutations.

jbloom commented 1 year ago

It is correct that nextstrain shows quite a few mutations to 95I in clade 21J: https://nextstrain.org/ncov/gisaid/global/6m?c=gt-S_95

jbloom commented 1 year ago

Fixed by properly accounting for sites that are masked in UShER in commit e718c82

See also https://github.com/yatisht/usher/issues/312#issuecomment-1357962052