mbhall88 / drprg

Drug Resistance Prediction with Reference Graphs
https://mbh.sh/drprg/
MIT License
19 stars 1 forks source link

Deal with gene absence #21

Closed mbhall88 closed 1 year ago

mbhall88 commented 1 year ago

Arnold provided me with a MTB sample that has a 32kbp deletion which include katG. This seems to be quite rare in clinical samples? I see https://doi.org/10.1016/j.ijmm.2021.151506 (2021) claims to be the first report of this...

When I run drprg on this sample, we produce an S call for INH, but katG indeed is missing from the VCF - i.e. pandora has noted it's absence. Both mykrobe and tb-profiler call S too. I think mykrobe's panel can be altered to detect gene absence from looking at the wiki, but I'm not clear on how you link this to a drug...

So, this means we should be able to quite easily detect gene deletions. The next question becomes should we do about this. I did a literature search for gene deletions of some other resistance-associated genes but I'm struggling to find any papers that have assessed gene deletion impact on resistance, except for katG/INH. Based on the intro to that paper above maybe people assume the fitness cost is too high for the bug to survive?

A last question is whether this goes in the paper with some case studie(s) of samples with gene deletions?

iqbal-lab commented 1 year ago

Basically gene deletion detection would be a huge win . Literature reports are limited but partially because people don't look. Florencia in our group is now looking for them in cryptic. But I would expect gene deletions to cause resistance for all 4 genes where we say any frame shift causes resistance. Also, would be great to detect it for mmpR5 and MmpL5 and mmpS5.

Mykrobe can do it (it does do it for staph) and is essentially an error of judgement on my part that it doesn't for TB (Basically always something urgent pushes doing it down the priority list)

mbhall88 commented 1 year ago

Arnold also has a case of a pncA deletion.

I can easily look through our samplesheet of 44K samples for these gene deletions. Only thing is, if we add it, I would also need to add it for mykrobe I guess. I would just need Martin to clarify how I link gene absence probes to a drug.

iqbal-lab commented 1 year ago

Yeah I guess so. Martin on vacation from Thursday fyi

mbhall88 commented 1 year ago

So had a look to see if there's any gene deletions in the ~8500, not expecting too many given the scarcity in the literature

rpoB 1
ahpC 2
embA 2
embB 2
fabG1 2
gid 2
gyrA 2
inhA 2
rplC 2
eis 3
ethA 14
katG 15
rrs 27
pncA 31

Wowza. This shows the number of samples a gene deletion was detected in.

I had a quick look at some of the katG ones and all of those are FN INH for all callers. So adding gene deletion detection is going to boost our sensitivity nicely (at least for INH)!

iqbal-lab commented 1 year ago

NICE

iqbal-lab commented 1 year ago

Re rareness in the literature, I really think it is under measured. This is why we're finally doing it on the cryptic data

mbhall88 commented 1 year ago

Okay, so we detect gene deletions in the four genes where any nonsense mutation (or frameshift) cause resistance: pncA, katG, ethA, gid. The diff between before/after this feature is

Tool Drug ΔFN ΔFP
drprg Amikacin 0 0
drprg Capreomycin 0 0
drprg Delamanid 0 0
drprg Ethambutol 0 0
drprg Ethionamide -4 0
drprg Isoniazid -10 0
drprg Kanamycin 0 0
drprg Levofloxacin 0 0
drprg Linezolid 0 0
drprg Moxifloxacin 0 0
drprg Ofloxacin 0 0
drprg Pyrazinamide -1 3
drprg Rifampicin 0 0
drprg Streptomycin 0 0

I'll describe the three FPs

  1. ERR046857 - is missing a lot of genes: 'ahpC', 'embA', 'embB', 'fabG1', 'gid', 'gyrA', 'inhA', 'pncA', 'rplC', 'rpoB', 'rrs'. I'm pretty tempted to remove this isolate from the spreadsheet as this seems insane given it is supposedly susceptible to all four first-line drugs (and is missing pncA, rpoB, fabG1, and embA/B).
  2. SRR6824292 - missing pncA. Is an MDR strain. Seems like the phenotyping must be wrong right? From the literature, all the examples I could find of pncA deletions indicate PZA resistance
  3. SRR6824651 - missing pncA and has other weird phenotype discrepancies. For example, phenotyping says S for Moxi, but R for Oflox and R for Amikacin and S for Capreomycin.
iqbal-lab commented 1 year ago

These do look weird. Is ERR046857 even wgs?

mbhall88 commented 1 year ago

These do look weird. Is ERR046857 even wgs?

From what I can tell. It is also from 2011 though....