mbhall88 / drprg

Drug Resistance Prediction with Reference Graphs
https://mbh.sh/drprg/
MIT License
19 stars 1 forks source link

Add some common resistance-conferring mutations that do no exist in population graph #23

Closed mbhall88 closed 1 year ago

mbhall88 commented 1 year ago

See https://github.com/mbhall88/drprg-paper/issues/2 and https://github.com/mbhall88/drprg-paper/issues/2 for the motivation.

There are some common mutations which do not exist in the reference graph. As such, when there is a minor allele in a sample for one of these mutations, we do not detect it. This is because discover only finds major alleles, and to detect minor alleles, the allele must exist in the graph in the first place.

mbhall88 commented 1 year ago

This seems to have worked reasonably well

Tool Drug ΔFN ΔFP
drprg Amikacin 0 0
drprg Capreomycin 0 -1
drprg Delamanid 0 5
drprg Ethambutol -1 4
drprg Ethionamide -52 22
drprg Isoniazid 0 0
drprg Kanamycin 0 0
drprg Levofloxacin 0 0
drprg Linezolid -1 0
drprg Moxifloxacin 0 -1
drprg Ofloxacin 0 0
drprg Pyrazinamide -2 -1
drprg Rifampicin -5 0
drprg Streptomycin 0 0

Quite a few ETO FPs though, which I will take a look at before moving on

mbhall88 commented 1 year ago

So all of those "new" ETO FPs are strong calls for a mutation I recently added (fabG1 L203L) which mykrobe and tb-profiler also call.

iqbal-lab commented 1 year ago

This sounds great!

mbhall88 commented 1 year ago

The delamanid FPs are caused by us calling minor alleles for ddn L49P. One of these is backed up by the other callers, one seems like a decent minor call, but not made by the other callers, and the rest are very low depth. This has made me realise I am not applying the same variant filters to the minor allele calls as I do the normal major allele calls. For instance, some of these delamanid minor calls only have depth 2x on the minor allele, so these should be filtered out

iqbal-lab commented 1 year ago

So, stepping back, what are the rules we want to apply to call a minor? Above x% of the total reads at that variant (either allele) and above some min absolute number, right? I'd think no other filters?

mbhall88 commented 1 year ago

The extra EMB FPs are also the same reasons as the delamanid ones.

I think we can just use the same filters for minors as we do for majors.

We also have an FRS for majors, which we obviously don't want for minors

iqbal-lab commented 1 year ago

The model behind Gt conf is only meaningful for majors, so I'd bin it for minors

mbhall88 commented 1 year ago

Adding in the filtering of minors gives this diff to the results above

Tool Drug ΔFN ΔFP
drprg Amikacin 0 0
drprg Capreomycin 0 0
drprg Delamanid 0 -1
drprg Ethambutol 0 -2
drprg Ethionamide 0 0
drprg Isoniazid 0 0
drprg Kanamycin 0 0
drprg Levofloxacin 0 -1
drprg Linezolid 0 0
drprg Moxifloxacin 0 -1
drprg Ofloxacin 0 0
drprg Pyrazinamide 0 0
drprg Rifampicin 0 -1
drprg Streptomycin 1 0

So the diff for this overarching issue is

Tool Drug ΔFN ΔFP
drprg Amikacin 0 0
drprg Capreomycin 0 -1
drprg Delamanid 0 4
drprg Ethambutol -1 2
drprg Ethionamide -52 22
drprg Isoniazid 0 0
drprg Kanamycin 0 0
drprg Levofloxacin 0 -1
drprg Linezolid -1 0
drprg Moxifloxacin 0 -2
drprg Ofloxacin 0 0
drprg Pyrazinamide -2 -1
drprg Rifampicin -5 -1
drprg Streptomycin 1 0