Closed ishida-md closed 2 years ago
Hello,
Thank you for your comment. You are correct that double base substitutions (DBS) are currently treated together with indels and other complex substitutions in the non-SNV dNdScv model (sometimes misleading called the "indel" model in the package). There are several reasons for this. Developing a DBS dndscv model would require a new substitution model (accounting for all possible DBS events) and a large rewrite of the buildref and annotation functions. Also, this model would not account for more complex substitution events (e.g. 3bp events). Whereas a refinement of the DBS model is possible (and it is in my list of things to consider), the benefits of complicating the dndscv model in this way are likely very small because most datasets have only small numbers of DBS and because the vast majority of DBS lead by chance to non-synonymous changes (as do most indels). dNdScv should be robust to the inclusion of synonymous DBS events in the non-SNV model, in the same way that it is robust to the fact that not all missense mutations are impactful.
I will certainly keep this suggestion in mind.
Thank you! Inigo
Correct me if I am wrong, but dndscv seems treat DBSs as indels regardless of their consequences on AA sequence.
For example, they are the peculiar ones that I found on skin tumors; 1 145537543 145537544 CC TT 1 145534222 145534223 GG AA (GRCh37)
First one does not alter AA sequence, and therefore it is like "synonymous DBS". Second one is more like "non-synonymous DBS", generating a new AA sequence(ITGA10 p.G576E).
dndscv seems to count both of them as indels, implying they are consequential, and this behavior seems to generate spurious calls in samples with lots of DBSs. I feel that these two types of DBSs need to be distinguished.
Could you address this issue?