mbhall88 / drprg

Drug Resistance Prediction with Reference Graphs
https://mbh.sh/drprg/
MIT License
19 stars 1 forks source link

Notice partial gene deletion that spans start codon #24

Closed mbhall88 closed 1 year ago

mbhall88 commented 1 year ago
    I've been through all of the drprg PZA FNs that are called by at least one other tool.

There are two overarching problems drprg has

  1. A lot of the missed calls are minor allele calls for variants not covered by anything in the PRG. So, because they're minor alleles, they don't get discovered as novel. THe pncA PRG is quite sparse so it might be worth us adding some more PZA-resistant isolates to the reference PRG to try and capture more of the popn. variation. And where the minor alleles are covered by the PRG they seem to fail the GAPS threshold of 0.3
  2. There are some big deletions that knock out the start codon, and some. We (surprisingly) discover the deletion, but get no coverage on it (or the ref)
    pncA    1       .       GTCATGTTCGCGATCGTCGCGGCGTCATGGACCCTATATCTGTGGCTGCCGCGTCGGTAGGCAAACTGCCCGGGCAGTCGCCCGAACGTATGGTGGACGTATGCGGGCGTTGATCATCGTCGACGTGCAGAACGACTTCTGCGAGGGTGGCTCGCTGGCGGTAACCGGTGGCGCCGCGCTGGCCCGCGCCATCAGCGACTACCTGGCCGAAGCGGCGGACTACCATCACGTCGTGGCAACCAAGGACTTCCACATCGACCCGGGTGACCACTTCTCCGGCACACCGGACTATTCCT        G,GTCATGTTCGCGATCGTCGCGGCGTCATGGACCCTATATCTGTGGCTGCCGCGTCGGTAGGCAAACTGCCCGGGCAGTCGCCCGAACGTATGGTGGACGTATGCGGGCGTTGATCATCGTCGACGTGCAGAACGACTGACTTCTGCGAGGGTGGCTCGCTGGCGGTAACCGGTGGCGCCGCGCTGGCCCGCGCCATCAGCGACTACCTGGCCGAAGCGGCGGACTACCATCACGTCGTGGCAACCAAGGACTTCCACATCGACCCGGGTGACCACTTCTCCGGCACACCGGACTATTCCT,GTCATGTTCGCGATCGTCGCGGCGTCATGGACCCTATATCTGTGGCTGCCGCGTCGGTAGGCAAACTGCCCGGGCAGTCGCCCGAACGTATGGTGGACGTATGCGGGCGTTGATCATCGTCGACGTGCAGAACGACTTCTGCGAGGGTGGCTCGCGGGCGGTAACCGGTGGCGCCGCGCTGGCCCGCGCCATCAGCGACTACCTGGCCGAAGCGGCGGACTACCATCACGTCGTGGCAACCAAGGACTTCCACATCGACCCGGGTGACCACTTCTCCGGCACACCGGACTATATCTT,GTCATGTTCGCGATCGTCGCGGCGTCATGGACCCTATATCTGTGGCTGCCGCGTCGGTAGGCAAACTGCCCGGGCAGTCGCCCGAACGTATGGTGGACGTATGCGGGCGTTGATCATCGTCGACGTGCAGAACGACTTCTGCGAGGGTGGCTCGCTGGCGGTAACCGGTGGCGCCGCGCCGGCCCGCGCCATCAGCGACTACCTGGCCGAAGCGGCGGACTACCATCACGTCGTGGCAACCAAGGACTTCCACATCGACCCGGGTGACCACTTCTCCGGCACACCGGACTATTCCT,GTCATGTTCGCGATCGTCGCGGCGTCATGGACCCTATATCTGTGGCTGCCGCGTCGGTAGGCAAACTGCCCGGGCAGTCGCCCGAACGTATGGTGGACGTATGCGGGCGTTGATCATCGTCGACGTGCAGAACGACTTCTGCGAGGGTGGCTCGCTGGCGGTAACCGGTGGCGCCGCGCTGGCCCGCGCCATCAGCGACTACCTGGCCGAAGCGGCGGACTACCATCACGTCGTGGCAACCAAGGACTTCCACATCGACCCGGGTGACCACCTCTCCGGCACACCGGACTATTCCT,GTCATGTTCGCGATCGTCGCGGCGTCATGGACCCTATATCTGTGGCTGCCGCGTCGGTAGGCAAACTGCCCGGGCAGTCGCCCGAACGTATGGTGGACGTATGCGGGCGTTGATCATCGTCGACGTGCAGAACGACTTCTGCGAGGGTGGCTCGCTGGCGGTAACCGGTGGCGCCGCGCTGGCCCGCGCCATCAGCGACTACCTGGCCGAAGCGGCGGACTACCATCACGTCGTGGCAACCAAGGACTTCCACATCGACCCGGGTGACCACTTCTCCGGCACACCGGACTATATCTT,GTCATGTTCGCGATCGTCGCGGCGTCATGGACCCTATATCTGTGGCTGCCGCGTCGGTAGGCAAACTGCCCGGGCAGTCGCCCGAACGTATGGTGGACGTATGCGGGCGTTGATCATCGTCGACGTGCAGAACGACTTCTGCGAGGGTGGCTCGCTGGCGGTAACCGGTGGCGCCGCGCTGGCCCGCGCCATCAGCGACTACCTGGCCGAAGCGGCGGACTACCATCACGTCGTGGCAACCAAGGACTTCCACATCGACCCGGGTGACGACTTCTCCGGCACACCGGACTATTCCT,GTCATGTTCGCGATCGTCGCGGCGTCATGGACCCTATATCTGTGGCTGCCGCGTCGGTAGGCAAACTGCCCGGGCAGTCGCCCGAACGTATGGTGGACGTATGCGGGCGTTGATCATCGTCGACGTGCAGAACGACTTCTGCGAGGGTGGCTCGCTGGCGGTAACCGGTGGCGCCGCGCTGGCCCGCGCCATCAGCGACTACCTGGCCGAAGCGGCGGACTACCATCACGTCGTGGCAACCAAGGACTTCCACATCGACCCGGGTGACTACTTCTCCGGCACACCGGACTATTCCT,GTCATGTTCGCGATCGTCGCGGCGTCATGGACCCTATATCTGTGGCTGCCGCGTCGGTAGGCAAACTGCCCGGGCAGTCGCCCGAACGTATGGTGGACGTATGCGGGCGTTGATCATCGTCGACGTGCCGAACGACTTCTGCGAGGGTGGCTCGCTGGCGGTAACCGGTGGCGCCGCGCTGGCCCGCGCCATCAGCGACTACCTGGCCGAAGCGGCGGACTACCATCACGTCGTGGCAACCAAGGACTTCCACATCGACCCGGGTGACCACTTCTCCGGCACACCGGACTATTCCT .       .       VC=INDEL;GRAPHTYPE=SIMPLE       GT:MEAN_FWD_COVG:MEAN_REV_COVG:MED_FWD_COVG:MED_REV_COVG:SUM_FWD_COVG:SUM_REV_COVG:GAPS:LIKELIHOOD:GT_CONF      .:0,0,0,0,0,0,0,0,0,0:0,0,0,0,0,0,0,0,0,0:0,0,0,0,0,0,0,0,0,0:0,0,0,0,0,0,0,0,0,0:0,0,0,0,0,0,0,0,0,0:0,0,0,0,0,0,0,0,0,0:1,1,1,1,1,1,1,1,1,1:-488,-488,-488,-488,-488,-488,-488,-488,-488,-488:0

One way around this could be to notice when we have more than n consecutive VCF entries with a failed/null call and just call resistant? Or, to be more precise, notice when we have a failed position(s) that spans the start codon and then call resistant if it is one of the genes where gene deletion causes resistance.

_Originally posted by @mbhall88 in https://github.com/mbhall88/drprg-paper/issues/2

mbhall88 commented 1 year ago

In particular, this issue is concerned with solving problem 2 above.

We can detect whole gene deletions, but not partial deletions which knock out the start codon - which effectively amount to the same thing.

I've seen this issue in both pncA and katG.

mbhall88 commented 1 year ago

The implementation of this feature will likely need to change when/if https://github.com/rmcolq/pandora/issues/316 is closed.

iqbal-lab commented 1 year ago

Do you mean that there is a potential fix even without a fix for https://github.com/rmcolq/pandora/issues/316 ?

mbhall88 commented 1 year ago

Well noticing failed variants that span the start codon would be a kind of band-aid fix. The proper fix would be the resolution of that pandora issue

mbhall88 commented 1 year ago

Should we also be detecting when the stop is lost? There are two INH FNs that we miss because we don't detect stop loss and tbprofiler calls stop loss. We have null genotypes spanning the stop codon in both of these samples.

iqbal-lab commented 1 year ago

My guess is yes we should

mbhall88 commented 1 year ago

I made a change to the partial gene deletion code and also removed the GT CONF filter. The (Illumina) diff I get from these changes is

Tool Drug ΔFN ΔFP
drprg Amikacin 0 0
drprg Capreomycin 0 0
drprg Delamanid 0 0
drprg Ethambutol -1 1
drprg Ethionamide -13 1
drprg Isoniazid 0 0
drprg Kanamycin 0 0
drprg Levofloxacin -1 0
drprg Linezolid 0 0
drprg Moxifloxacin 0 0
drprg Ofloxacin 0 0
drprg Pyrazinamide -1 0
drprg Rifampicin -1 0
drprg Streptomycin 0 4

Most of these FPs are also FPs on tbprofiler also.


Regarding the stop lost stuff, Miranda made a good point, maybe we just flag it as an unknown mutation?

iqbal-lab commented 1 year ago

Good idea, flag as unknown seems safest.