mbhall88 / drprg

Drug Resistance Prediction with Reference Graphs
https://mbh.sh/drprg/
MIT License
19 stars 1 forks source link

Disruptive in-frame indels #27

Closed mbhall88 closed 1 year ago

mbhall88 commented 1 year ago

I have now been through all of the FNs where one of the other callers has a TP.

There are 5 (ethionamide), 1 (isoniazid), 1 (pyrazinamide), and 1 (streptomycin) FN where tbprofiler calls a disruptive in-frame indel. The genes associated with these four drugs have the rule "any frameshift causes mutation", but these aren't frameshifts. The nomenclature used by tbprofiler is from snpEff I believe. By disruptive it means it deletes a multiple of 3 bases, but that these don't fall evenly into codon boundaries, so they create a new amino acid.

Is this something we also want to call? I can't find much info in the literature about the impact of disruptive in-frame indels...

iqbal-lab commented 1 year ago

I would have thought the thing to do would be to translate the whole gene, and see if the translation is one of a list of known acceptable translations (make a list of translations of susceptible samples) - am guessing this does not need to be v heavyweight. Most cases like that i would expect to end up with horribly different translations.

I talked to Phil and he said

"From a structural perspective, you can often (but not always) get away from deleting residues from a long loop, but not from the “core” of the protein. And conversely putting residues into a loop (as long as it isn’t doing something) is often ok as well. If you want an extreme example, Brian Kobilka and others won the Nobel Prize in 2018 for getting the structure of GPCRs (membrane proteins involved in signalling and ≥50% of all drug targets). They did it by splicing a whole other protein (T4 lysozyme) into one of the intracellular loops which then provided enough surface area for the protein to crytallise. But the GPCR still, I think, works! https://www.nobelprize.org/uploads/2018/06/kobilka-lecture.pdf. Likewise, there will be multi-domain proteins where not having some C-terminal domain will reduce functionality but is tolerated — Oliver Adams who I co-supervised chopped off the CTD of mmpL3 to get their structure. Hence stop codons can be tolerated in some places, but not others. At a higher level, if a gene is essential, then less will be tolerated (rpoB / gyrA) whereas if it isn’t essential “anything goes” (pnca / Rv0678)"

iqbal-lab commented 1 year ago

the hacky solution is to get hamming distance of translation from list of susceptible translations and if dif>1, call it disruptive. my guess is things like that will produce very different proteins.

Leah started asking about what happens if you have 3 different indels in different places which together make things in-frame, which did my head in.

mbhall88 commented 1 year ago

I can easily test whether an indel is in-frame or not; my question was whether we want to call these as resistant? The expert rule in the catalogue is for frameshifts and these aren't frameshifts. And I can't find any TB literature mentioning disruptive in-frame deletions...

Leah started asking about what happens if you have 3 different indels in different places which together make things in-frame, which did my head in.

Haha yes, I've also had this thought many times when staring at all these FPs and FNs. Remember this: https://doi.org/10.1038/s41467-021-25055-y