mbhall88 / drprg-paper

Workflow and files associated with the paper for DrPRG
0 stars 0 forks source link

Rolling results #2

Closed mbhall88 closed 1 year ago

mbhall88 commented 1 year ago

This issue will document the rolling results.


The first sneak peak is all 437 Nanopore isolates and 400 Illumina isolates (selected at random).

Nanopore

nanopore

Drug Tool FN(R) FP(S) Sensitivity (95% CI) Specificity (95% CI) MCC
Amikacin Drprg 1(15) 3(157) 93.3% (70.2-98.8%) 98.1% (94.5-99.3%) 0.8642875644329534
Amikacin Mykrobe 1(15) 3(157) 93.3% (70.2-98.8%) 98.1% (94.5-99.3%) 0.8642875644329534
Amikacin Tbprofiler 1(15) 3(157) 93.3% (70.2-98.8%) 98.1% (94.5-99.3%) 0.8642875644329534
Capreomycin Drprg 1(3) 1(64) 66.7% (20.8-93.9%) 98.4% (91.7-99.7%) 0.6510416666666666
Capreomycin Mykrobe 1(3) 1(64) 66.7% (20.8-93.9%) 98.4% (91.7-99.7%) 0.6510416666666666
Capreomycin Tbprofiler 1(3) 1(64) 66.7% (20.8-93.9%) 98.4% (91.7-99.7%) 0.6510416666666666
Ethambutol Drprg 7(29) 21(328) 75.9% (57.9-87.8%) 93.6% (90.4-95.8%) 0.5830010462966467
Ethambutol Mykrobe 6(29) 22(328) 79.3% (61.6-90.2%) 93.3% (90.1-95.5%) 0.5975951983859127
Ethambutol Tbprofiler 7(29) 22(328) 75.9% (57.9-87.8%) 93.3% (90.1-95.5%) 0.5747241429661474
Ethionamide Drprg 26(30) 1(86) 13.3% (5.3-29.7%) 98.8% (93.7-99.8%) 0.2624057235284411
Ethionamide Mykrobe 5(30) 57(86) 83.3% (66.4-92.7%) 33.7% (24.6-44.2%) 0.16405763204424978
Ethionamide Tbprofiler 14(30) 4(86) 53.3% (36.1-69.8%) 95.3% (88.6-98.2%) 0.5643248464313276
Isoniazid Drprg 35(117) 3(265) 70.1% (61.3-77.6%) 98.9% (96.7-99.6%) 0.7641591750285314
Isoniazid Mykrobe 18(117) 45(265) 84.6% (77.0-90.0%) 83.0% (78.0-87.1%) 0.6432989433455073
Isoniazid Tbprofiler 21(117) 4(265) 82.1% (74.1-88.0%) 98.5% (96.2-99.4%) 0.8445257661394283
Kanamycin Drprg 0(3) 2(118) 100.0% (43.9-100.0%) 98.3% (94.0-99.5%) 0.7680042372764464
Kanamycin Mykrobe 0(3) 2(118) 100.0% (43.9-100.0%) 98.3% (94.0-99.5%) 0.7680042372764464
Kanamycin Tbprofiler 0(3) 2(118) 100.0% (43.9-100.0%) 98.3% (94.0-99.5%) 0.7680042372764464
Moxifloxacin Drprg 0(0) 1(1) - 0.0% (0.0-79.3%) -
Moxifloxacin Mykrobe 0(0) 1(1) - 0.0% (0.0-79.3%) -
Moxifloxacin Tbprofiler 0(0) 1(1) - 0.0% (0.0-79.3%) -
Ofloxacin Drprg 13(15) 1(158) 13.3% (3.7-37.9%) 99.4% (96.5-99.9%) 0.27378347692948213
Ofloxacin Mykrobe 3(15) 4(158) 80.0% (54.8-93.0%) 97.5% (93.7-99.0%) 0.7524691275756947
Ofloxacin Tbprofiler 3(15) 3(158) 80.0% (54.8-93.0%) 98.1% (94.6-99.4%) 0.7810126582278482
Pyrazinamide Drprg 16(28) 3(243) 42.9% (26.5-60.9%) 98.8% (96.4-99.6%) 0.5540455669886942
Pyrazinamide Mykrobe 10(28) 5(243) 64.3% (45.8-79.3%) 97.9% (95.3-99.1%) 0.6796400181154288
Pyrazinamide Tbprofiler 12(28) 6(243) 57.1% (39.1-73.5%) 97.5% (94.7-98.9%) 0.6093260891549527
Rifampicin Drprg 8(77) 6(287) 89.6% (80.8-94.6%) 97.9% (95.5-99.0%) 0.8837166974977075
Rifampicin Mykrobe 6(77) 6(287) 92.2% (84.0-96.4%) 97.9% (95.5-99.0%) 0.9011719987329744
Rifampicin Tbprofiler 75(77) 1(287) 2.6% (0.7-9.0%) 99.7% (98.1-99.9%) 0.10159114367294636
Streptomycin Drprg 9(55) 16(126) 83.6% (71.7-91.1%) 87.3% (80.4-92.0%) 0.6875051115519748
Streptomycin Mykrobe 7(55) 38(126) 87.3% (76.0-93.7%) 69.8% (61.3-77.2%) 0.526015018770466
Streptomycin Tbprofiler 15(55) 19(126) 72.7% (59.8-82.7%) 84.9% (77.6-90.1%) 0.5656453811464112

Next avenues of investigation:

Positives:

Illumina

illumina

Drug Tool FN(R) FP(S) Sensitivity (95% CI) Specificity (95% CI) MCC
Amikacin Drprg 0(15) 1(166) 100.0% (79.6-100.0%) 99.4% (96.7-99.9%) 0.9653250279768749
Amikacin Mykrobe 1(15) 1(166) 93.3% (70.2-98.8%) 99.4% (96.7-99.9%) 0.9273092369477912
Amikacin Tbprofiler 0(15) 1(166) 100.0% (79.6-100.0%) 99.4% (96.7-99.9%) 0.9653250279768749
Capreomycin Drprg 2(13) 4(93) 84.6% (57.8-95.7%) 95.7% (89.5-98.3%) 0.7558571990231637
Capreomycin Mykrobe 3(13) 4(93) 76.9% (49.7-91.8%) 95.7% (89.5-98.3%) 0.70359611692274
Capreomycin Tbprofiler 2(13) 4(93) 84.6% (57.8-95.7%) 95.7% (89.5-98.3%) 0.7558571990231637
Delamanid Drprg 1(1) 0(93) 0.0% (0.0-79.3%) 100.0% (96.0-100.0%) -
Delamanid Mykrobe 1(1) 0(93) 0.0% (0.0-79.3%) 100.0% (96.0-100.0%) -
Delamanid Tbprofiler 1(1) 0(93) 0.0% (0.0-79.3%) 100.0% (96.0-100.0%) -
Ethambutol Drprg 5(53) 16(229) 90.6% (79.7-95.9%) 93.0% (89.0-95.7%) 0.7795344823612268
Ethambutol Mykrobe 5(53) 17(229) 90.6% (79.7-95.9%) 92.6% (88.4-95.3%) 0.7712443312992736
Ethambutol Tbprofiler 5(53) 18(229) 90.6% (79.7-95.9%) 92.1% (87.9-95.0%) 0.7631197122668696
Ethionamide Drprg 20(37) 11(127) 45.9% (31.0-61.6%) 91.3% (85.2-95.1%) 0.4141740735523773
Ethionamide Mykrobe 11(37) 15(127) 70.3% (54.2-82.5%) 88.2% (81.4-92.7%) 0.5643018224168724
Ethionamide Tbprofiler 10(37) 16(127) 73.0% (57.0-84.6%) 87.4% (80.5-92.1%) 0.5737592501981046
Isoniazid Drprg 17(135) 5(218) 87.4% (80.8-92.0%) 97.7% (94.7-99.0%) 0.868118053524601
Isoniazid Mykrobe 14(135) 7(218) 89.6% (83.3-93.7%) 96.8% (93.5-98.4%) 0.8735871080954598
Isoniazid Tbprofiler 10(135) 7(218) 92.6% (86.9-95.9%) 96.8% (93.5-98.4%) 0.8977596305525896
Kanamycin Drprg 4(25) 2(168) 84.0% (65.3-93.6%) 98.8% (95.8-99.7%) 0.8582554180919595
Kanamycin Mykrobe 6(25) 2(168) 76.0% (56.6-88.5%) 98.8% (95.8-99.7%) 0.8066918414409607
Kanamycin Tbprofiler 5(25) 2(168) 80.0% (60.9-91.1%) 98.8% (95.8-99.7%) 0.8327103314106743
Levofloxacin Drprg 18(28) 5(145) 35.7% (20.7-54.2%) 96.6% (92.2-98.5%) 0.42231266491410374
Levofloxacin Mykrobe 2(28) 10(145) 92.9% (77.4-98.0%) 93.1% (87.8-96.2%) 0.7799214704762708
Levofloxacin Tbprofiler 1(28) 11(145) 96.4% (82.3-99.4%) 92.4% (86.9-95.7%) 0.7903590726235468
Linezolid Drprg 0(0) 0(128) - 100.0% (97.1-100.0%) -
Linezolid Mykrobe 0(0) 0(128) - 100.0% (97.1-100.0%) -
Linezolid Tbprofiler 0(0) 0(128) - 100.0% (97.1-100.0%) -
Moxifloxacin Drprg 11(15) 8(126) 26.7% (10.9-52.0%) 93.7% (88.0-96.7%) 0.2244992239690201
Moxifloxacin Mykrobe 2(15) 18(126) 86.7% (62.1-96.3%) 85.7% (78.5-90.8%) 0.5388625547887866
Moxifloxacin Tbprofiler 2(15) 20(126) 86.7% (62.1-96.3%) 84.1% (76.8-89.5%) 0.5155328733959855
Ofloxacin Drprg 3(3) 0(5) 0.0% (0.0-56.1%) 100.0% (56.6-100.0%) -
Ofloxacin Mykrobe 0(3) 0(5) 100.0% (43.9-100.0%) 100.0% (56.6-100.0%) 1.0
Ofloxacin Tbprofiler 0(3) 0(5) 100.0% (43.9-100.0%) 100.0% (56.6-100.0%) 1.0
Pyrazinamide Drprg 9(23) 0(132) 60.9% (40.8-77.8%) 100.0% (97.2-100.0%) 0.7548792871746883
Pyrazinamide Mykrobe 2(23) 2(132) 91.3% (73.2-97.6%) 98.5% (94.6-99.6%) 0.8978919631093544
Pyrazinamide Tbprofiler 2(23) 2(132) 91.3% (73.2-97.6%) 98.5% (94.6-99.6%) 0.8978919631093544
Rifampicin Drprg 8(118) 4(240) 93.2% (87.2-96.5%) 98.3% (95.8-99.4%) 0.9237938244724586
Rifampicin Mykrobe 6(118) 4(240) 94.9% (89.3-97.6%) 98.3% (95.8-99.4%) 0.936595807066951
Rifampicin Tbprofiler 117(118) 1(240) 0.8% (0.1-4.6%) 99.6% (97.7-99.9%) 0.0271689721964718
Streptomycin Drprg 12(42) 1(64) 71.4% (56.4-82.8%) 98.4% (91.7-99.7%) 0.7512240395539047
Streptomycin Mykrobe 8(42) 5(64) 81.0% (66.7-90.0%) 92.2% (83.0-96.6%) 0.7418210904541338
Streptomycin Tbprofiler 8(42) 2(64) 81.0% (66.7-90.0%) 96.9% (89.3-99.1%) 0.8037977341533646

Next avenues of investigation:

I'll probably wait until the Nanopore stuff is debugged and then run on a larger sample of data before debugging Illumina

mbhall88 commented 1 year ago

I took a look at the INH FNs today. For those where mykrobe made a TP call (so I can see which mutation we missed) 13/16 were use missing fabG1 C-15T promoter mutation. drprg called this variant, for all of those 13, but they were filtered by the fraction of read support (FRS) filter - which is set to 0.70. Nearly all of those mutations had an FRS of 0.58-0.64.

The reason for this is the alleles are quite similar, and I suspect maybe some shared minimizers are wreaking havoc here.

An example VCF record showing the alleles and coverage

fabG1   81      555cbd3d        CGAGACGATAGGT   CGAGACGATAGGC,CGAGATGATAGGT,TGAGACGATAGGT       .       frs     VC=PH_SNPs;GRAPHTYPE=SIMPLE;VARID=fabG1_G-17T,fabG1_A-16X,fabG1_C-15X,fabG1_T-8X;PREDICT=S,S,R,S      GT:MEAN_FWD_COVG:MEAN_REV_COVG:MED_FWD_COVG:MED_REV_COVG:SUM_FWD_COVG:SUM_REV_COVG:GAPS:LIKELIHOOD:GT_CONF      2:13,16,22,3:11,13,27,2:7,14,19,0:4,8,30,1:82,82,134,14:71,68,164,10:0.5,0.4,0,0.75:-441.21,-406.243,-273.794,-577.451:132.448

These dont get collapsed by make PRG because the minimum match lengths between the three variants described by this allele are 4 and 6 (we use min match len of 7).

The other interesting thing is that each time, the allele with the next best coverage is allele 1 which differs in two positions from allele 2 (middle and end), so I reckon there's a minimizer that covers the start of this allele before the two alleles differ.

Not sure whether the "hacky" way of decreasing the FRS threshold is the best way to go? Or changing some parameters in make prg or pandora...

iqbal-lab commented 1 year ago

Could drop min match length also? In our covid work using pileups, we find frs of 0.7 is too high just because of noise in the reads

mbhall88 commented 1 year ago

Interesting. What FRS have you been using in your covid work?

iqbal-lab commented 1 year ago

Well, we've just shifted to 0.6 for nanopore, but now we're distracted fixing bugs before going back to carefully choose FRS thresholds

mbhall88 commented 1 year ago

Okay, so after changing the minimum match length to 5 and the minimum FRS to 0.60, there are only two FNs that mykrobe calls that we don't. One of those is an indel which fails the FRS filter at 0.59 and the other is a dodgey looking indel call from mykrobe that isn't called by tbprofiler or drprg, so I'm not phased about that. Interestingly, that sample has a synonymous SNP in the first codon.

The allele at that fabG1 variant now looks slightly better and has been split in two

fabG1   81      8ca378d9        CGAGAC  CGAGAT,TGAGAC   .       PASS    VC=PH_SNPs;GRAPHTYPE=SIMPLE;VARID=fabG1_G-17T,fabG1_A-16X,fabG1_C-15X;PREDICT=S,S,R     GT:MEAN_FWD_COVG:MEAN_REV_COVG:MED_FWD_COVG:MED_REV_COVG:SUM_FWD_COVG:SUM_REV_COVG:GAPS:LIKELIHOOD:GT_CONF      1:11,16,6:12,25,2:9,15,0:5,28,1:71,101,25:75,155,11:0.5,0,0.75:-281.052,-151.402,-388.948:129.65
fabG1   93      fa7956b3        T       C       .       ld;sb   VC=SNP;GRAPHTYPE=SIMPLE;VARID=fabG1_T-8X;PREDICT=S      GT:MEAN_FWD_COVG:MEAN_REV_COVG:MED_FWD_COVG:MED_REV_COVG:SUM_FWD_COVG:SUM_REV_COVG:GAPS:LIKELIHOOD:GT_CONF      0:0,0:1,0:0,0:1,0:0,0:2,0:1,1:-129.795,-138.605:8.80986

Next task is to dig into the poor ofloxacin sensitivity.

mbhall88 commented 1 year ago

The OFX FNs are a fairly straightforward fix. It turns out that all of the FNs are gyrA D94X. What is happening here is that there is a silent mutation (does not confer resistance) at codon 95 (S95T) that occurs in the same allele as that variant in the VCF/PRG. Long story short, I end up combining these two variants and calling an unknown prediction for OFX with novel variant gyrA_DS94GT. So I just need to break these up and check whether any of them are in the panel and associated with resistance. Should hopefully finish the implementation tomorrow.

iqbal-lab commented 1 year ago

Great!

mbhall88 commented 1 year ago

Here's the updated plots after the INH and OFX fixes listed above

Nanopore

image

Illumina

image


Some good improvements for nanopore. I'm going to have a look at the drprg STM, ETO, PZA and RIF sensitivity as mykrobe seems to be better than drprg. But pretty happy with the specificity of drprg at the moment.

lachlancoin commented 1 year ago

Looks good. Not sure what going on with TB profiler with RIF

On Mon, 10 Oct 2022 at 16:37, Michael Hall @.***> wrote:

Here's the updated plots after the INH and OFX fixes listed above Nanopore

[image: image] https://user-images.githubusercontent.com/20403931/194803916-6afb90ec-f32d-463d-98e3-a6e342a4ec48.png Illumina

[image: image] https://user-images.githubusercontent.com/20403931/194803927-c55a2ad4-5264-4a70-913e-6da61b54a33e.png

Some good improvements for nanopore. I'm going to have a look at the drprg STM, ETO, PZA and RIF sensitivity as mykrobe seems to be better than drprg. But pretty happy with the specificity of drprg at the moment.

— Reply to this email directly, view it on GitHub https://github.com/mbhall88/drprg-paper/issues/2, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA6TKZGF6CM36LGQH63SLBTWCOTRVANCNFSM6AAAAAAQWLHP7A . You are receiving this because you are subscribed to this thread.Message ID: @.***>

mbhall88 commented 1 year ago

Looks good. Not sure what going on with TB profiler with RIF

It's most likely an issue with the custom panel I built. I'll fix that up once I've finished debugging drprg. I'm assuming tb profiler is on par with mykrobe for RIF though

iqbal-lab commented 1 year ago

Looks great. I am actually surprised how good the specificity is for PZA nanopore, given it is dominated by indels. I had thought we had issues with indels

mbhall88 commented 1 year ago

I've now gone through the remainder of the FNs and nearly all of the FPs for nanopore.

RIF

2 FNs where drprg didn't discover the variant

4 FPs where all three tools calls rpoB L430X
1 FP where mykrobe and drprg call rpoB L452X
1 FP that just scraped through the low depth cutoff of 3 in drprg - it had a depth of 3.

STM

1 FN where drprg calls 2 non-synonymous mutations (not in the panel) - these are also called by tb-profiler
1 FN where drprg calls the correct variant rpsL K43R but fails FRS (0.52)

14 FPs are called by all three callers. 9 were mutations in gid, 3 were in rrs, and 2 in rpsL. I'm not sure what we want to do here, because it is fairly reasonable this is a phenotyping problem. After a quick search, I found two references to support this [1, 2]. From 1

low-level streptomycin resistance mediated by gidB were frequently misclassified with respect to streptomycin resistance when using the WHO-recommended critical concentration of 2 μg/ml.

2 FPs were rpsL K88R which is very strongly associated with STM resistance. These were also called by mykrobe but not tb-profiler.

2 FPs were confident deletions in gid only called by drprg. mykrobe called one of them, but it was filtered due to low expected proportion of expected depth.

ETO

In total, there were 9 FNs which were all called by mykrobe, but not tb-profiler or drprg. They're all indels in ethA and de novo variant discovery was not triggered in drprg for any of them. In one of those FNs, there was a promoter mutation as well, which drprg did call, but it was filtered out for low FRS (0.55).

The other thing to note here is while mykrobe's sensitivity is much better than drprg and tbprofiler, it's specificity is terrible.

PZA

1 FN is pncA R154G which is called by mykrobe and tb-profiler. drprg calls the correct allele, but it is filtered out for low FRS (0.54)
2 FNs are indels called by mykrobe only. Both indels were null genotype (drprg calls this F for failed) as the depth was split evenly across two alleles.

INH

4 FPs were called confidently by all three tools (fabG1 C-15T)
3 FPs are deletions called by drprg not called by either tool. They had below 10x depth on drprg so probably not super confident


I don't think there is much more I can do to improve drprg's nanopore performance here. FRS could possibly be lowered? but it would only save a small number of FN/FPs

I'll fix up the tb-profiler RIF sensitivity and then get stuck into the Illumina results

(This text file is my notepad while I was investigating these FNs and FPs)

drprg_nanopore_fn_fp_investigation.txt

mbhall88 commented 1 year ago

As our specificity is better or the same compared to mykrobe and tbprofiler for all drugs on Illumina, I only investigated the FNs.

tl;dr there are three things we may want to try to improve sensitivity as I suspect once we scale this analysis up to thousands of samples some of these problems will get bigger

  1. there are a noticeable amount of variants where de novo discovery in pandora cannot find a path between the start and end kmer of a candidate region. All of the FNs that we miss and mykrobe and/or tbprofiler get are due to this. Along with some RIF and ethionamide (ETO) FNs. The solution to this seems to be switching to Leandro's racon version of variant discovery in pandora. However, this is not very easy as that fork the current tip of pandora have diverged quite a bit. @iqbal-lab do you think it's realistic that Leandro could try and get this on master in the next few weeks? Or will I have to try and do it myself?
  2. There are 3 FNs in total that fail FRS with values 0.54, 0.53, and 0.58. Maybe we look to lower it even further? Does 0.50 or 0.51 sound fair?
  3. tbprofiler has an unfair advantage over mykrobe (and drprg). There were quite a few minor allele resistance calls. I have put mykrobe in haploid mode so it won't call minor resistance. drprg can't call minor resistance either way. Should I let mykrobe call minor variants on Illumina or make the minor allele frequency 0.5 for tbprofiler?

Aside from those overarching points, there were also some other FNs which were due to two variants right next to each other. For example, ERR2510154 has rpoB_S450F, which is actually caused by a 2bp MNP. One of these positions exists in the reference PRG drprg uses. The other variant gets discovered, but gets added in as a separate allele (bubble) in the PRG - I guess this is how make_prg update works though? Here it is

rpoB    1449    fa44b92a        C       T       .       ld;lgc  VC=SNP;GRAPHTYPE=SIMPLE;VARID=rpoB_ACTGTCGGCG1344A,rpoB_GTC1347G,rpoB_TC1348T,rpoB_S450X,rpoB_TCG1348T,rpoB_S450*,rpoB_C1349CA,rpoB_C1349CAA,rpoB_C1349CAC,rpoB_C1349CAG,rpoB_C1349CAT,rpoB_C1349CC,rpoB_C1349CCA,rpoB_C1349CCC,rpoB_C1349CCG,rpoB_C1349CCT,rpoB_C1349CG,rpoB_C1349CGA,rpoB_C1349CGC,rpoB_C1349CGG,rpoB_C1349CGT,rpoB_C1349CT,rpoB_C1349CTA,rpoB_C1349CTC,rpoB_C1349CTG,rpoB_C1349CTT,rpoB_CG1349C,rpoB_CGG1349C;PREDICT=F,F,F,F,F,F,F,F,F,F,F,F,F,F,F,F,F,F,F,F,F,F,F,F,F,F,F,F        GT:MEAN_FWD_COVG:MEAN_REV_COVG:MED_FWD_COVG:MED_REV_COVG:SUM_FWD_COVG:SUM_REV_COVG:GAPS:LIKELIHOOD:GT_CONF      .:0,0:0,0:0,0:0,0:0,0:0,1:1,1:-278,-278:0
rpoB    1450    9fbca785        G       T       .       ld;lgc  VC=SNP;GRAPHTYPE=SIMPLE;VARID=rpoB_ACTGTCGGCG1344A,rpoB_S450X,rpoB_TCG1348T,rpoB_S450*,rpoB_CG1349C,rpoB_CGG1349C,rpoB_G1350GA,rpoB_G1350GAA,rpoB_G1350GAC,rpoB_G1350GAG,rpoB_G1350GAT,rpoB_G1350GC,rpoB_G1350GCA,rpoB_G1350GCC,rpoB_G1350GCG,rpoB_G1350GCT,rpoB_G1350GG,rpoB_G1350GGA,rpoB_G1350GGC,rpoB_G1350GGG,rpoB_G1350GGT,rpoB_G1350GT,rpoB_G1350GTA,rpoB_G1350GTC,rpoB_G1350GTG,rpoB_G1350GTT,rpoB_GG1350G,rpoB_GGC1350G;PREDICT=F,F,F,F,F,F,F,F,F,F,F,F,F,F,F,F,F,F,F,F,F,F,F,F,F,F,F,F        GT:MEAN_FWD_COVG:MEAN_REV_COVG:MED_FWD_COVG:MED_REV_COVG:SUM_FWD_COVG:SUM_REV_COVG:GAPS:LIKELIHOOD:GT_CONF      .:0,0:0,0:0,0:0,0:0,0:0,0:1,1:-278,-278:0

The allele of this variant is TCG>TTT so pandora should be able to thread reads through both of these variants, but doesn't seem to be able to....??

A similar thing happened for a few other FNs. I'm wondering if I should run on the full dataset and then manually add to the reference PRG some of the common variants that cause this problem? I was thinking the "correct" way to do this @iqbal-lab would be to look through the cryptic metadata sheets and add some samples that contain those variants that cause some of these problems?

drprg_illumina_fn_investigation.txt

iqbal-lab commented 1 year ago

Hi there, I'm a bit wiped out so will be brief

  1. Leandro is buried in 2 projects (plasmid stuff using Pandora, and Karel's mof prpject) and trying to extricate from the latter, so I think it would be hard for him to merge the racon branch soon. I must admit, I had forgotten it wasn't merged v sorry Michael. I think it would be great if you could do it, and I think this could help a lot, and might make redundant the issue about adjacent variants, as racon just does the whole gene .
  2. I agree use 0.51, we've moved to that with covid.
  3. I how about return to minor alleles after addressing the above?

In terms of adding more variants to the graph; we can do this, but racon might mean you don't need to

mbhall88 commented 1 year ago

I've just realised, we should probably put some kind of minimum depth filter on these results too. i.e. samples with less than d depth are excluded from the sensitivity/specificity plots.

Does everyone agree? If so, does anyone have a preference for what d should be? I arbitrarily thought of 15x? (This is separate from the depth analysis in mbhall88/drprg-paper#3)

Here is the depth distribution for the 400 Illumina test set

image

and the full nanopore

image

Additionally, it might be wise to have a contamination proportion filter? For instance, when I align the reads to the decontamination database, I calculate the fraction of reads that we keep (i.e. MTB), fraction of reads that align to a contaminant, and a fraction of reads unmapped. Again, arbitrarily was thinking exclude samples with more than 5% contamination? Too harsh?

This is the fraction of contamination for Illumina

image

and nanopore

image

iqbal-lab commented 1 year ago

Both seem perfectly reasonable to me

mbhall88 commented 1 year ago

So I have a working drprg branch adapted to use pandora with the racon denovo method (https://github.com/rmcolq/pandora/pull/299).

I've tested it out on two Illumina runs listed in https://github.com/mbhall88/drprg-paper/issues/2.

The first, ERR2510154, was the example VCF above. So, with the old pandora denovo process, at the allele for rpoB_S450F we had

rpoB    1449    fa44b92a        C       T       .       ld;lgc  VC=SNP;GRAPHTYPE=SIMPLE;VARID=rpoB_ACTGTCGGCG1344A,rpoB_GTC1347G,rpoB_TC1348T,rpoB_S450X,rpoB_TCG1348T,rpoB_S450*,rpoB_C1349CA,rpoB_C1349CAA,rpoB_C1349CAC,rpoB_C1349CAG,rpoB_C1349CAT,rpoB_C1349CC,rpoB_C1349CCA,rpoB_C1349CCC,rpoB_C1349CCG,rpoB_C1349CCT,rpoB_C1349CG,rpoB_C1349CGA,rpoB_C1349CGC,rpoB_C1349CGG,rpoB_C1349CGT,rpoB_C1349CT,rpoB_C1349CTA,rpoB_C1349CTC,rpoB_C1349CTG,rpoB_C1349CTT,rpoB_CG1349C,rpoB_CGG1349C;PREDICT=F,F,F,F,F,F,F,F,F,F,F,F,F,F,F,F,F,F,F,F,F,F,F,F,F,F,F,F        GT:MEAN_FWD_COVG:MEAN_REV_COVG:MED_FWD_COVG:MED_REV_COVG:SUM_FWD_COVG:SUM_REV_COVG:GAPS:LIKELIHOOD:GT_CONF      .:0,0:0,0:0,0:0,0:0,0:0,1:1,1:-278,-278:0
rpoB    1450    9fbca785        G       T       .       ld;lgc  VC=SNP;GRAPHTYPE=SIMPLE;VARID=rpoB_ACTGTCGGCG1344A,rpoB_S450X,rpoB_TCG1348T,rpoB_S450*,rpoB_CG1349C,rpoB_CGG1349C,rpoB_G1350GA,rpoB_G1350GAA,rpoB_G1350GAC,rpoB_G1350GAG,rpoB_G1350GAT,rpoB_G1350GC,rpoB_G1350GCA,rpoB_G1350GCC,rpoB_G1350GCG,rpoB_G1350GCT,rpoB_G1350GG,rpoB_G1350GGA,rpoB_G1350GGC,rpoB_G1350GGG,rpoB_G1350GGT,rpoB_G1350GT,rpoB_G1350GTA,rpoB_G1350GTC,rpoB_G1350GTG,rpoB_G1350GTT,rpoB_GG1350G,rpoB_GGC1350G;PREDICT=F,F,F,F,F,F,F,F,F,F,F,F,F,F,F,F,F,F,F,F,F,F,F,F,F,F,F,F        GT:MEAN_FWD_COVG:MEAN_REV_COVG:MED_FWD_COVG:MED_REV_COVG:SUM_FWD_COVG:SUM_REV_COVG:GAPS:LIKELIHOOD:GT_CONF      .:0,0:0,0:0,0:0,0:0,0:0,0:1,1:-278,-278:0

with the new denovo process, we get

rpoB    1449    ba1603d4        CG      TG,TT   .       PASS    VC=PH_SNPs;GRAPHTYPE=SIMPLE;VARID=rpoB_ACTGTCGGCG1344A,rpoB_GTC1347G,rpoB_TC1348T,rpoB_S450X,rpoB_TCG1348T,rpoB_S450*,rpoB_C1349CA,rpoB_C1349CAA,rpoB_C1349CAC,rpoB_C1349CAG,rpoB_C1349CAT,rpoB_C1349CC,rpoB_C1349CCA,rpoB_C1349CCC,rpoB_C1349CCG,rpoB_C1349CCT,rpoB_C1349CG,rpoB_C1349CGA,rpoB_C1349CGC,rpoB_C1349CGG,rpoB_C1349CGT,rpoB_C1349CT,rpoB_C1349CTA,rpoB_C1349CTC,rpoB_C1349CTG,rpoB_C1349CTT,rpoB_CG1349C,rpoB_CGG1349C,rpoB_G1350GA,rpoB_G1350GAA,rpoB_G1350GAC,rpoB_G1350GAG,rpoB_G1350GAT,rpoB_G1350GC,rpoB_G1350GCA,rpoB_G1350GCC,rpoB_G1350GCG,rpoB_G1350GCT,rpoB_G1350GG,rpoB_G1350GGA,rpoB_G1350GGC,rpoB_G1350GGG,rpoB_G1350GGT,rpoB_G1350GT,rpoB_G1350GTA,rpoB_G1350GTC,rpoB_G1350GTG,rpoB_G1350GTT,rpoB_GG1350G,rpoB_GGC1350G;PREDICT=S,S,S,R,S,S,S,S,S,S,S,S,S,S,S,S,S,S,S,S,S,S,S,S,S,S,S,S,S,S,S,S,S,S,S,S,S,S,S,S,S,S,S,S,S,S,S,S,S,S     GT:MEAN_FWD_COVG:MEAN_REV_COVG:MED_FWD_COVG:MED_REV_COVG:SUM_FWD_COVG:SUM_REV_COVG:GAPS:LIKELIHOOD:GT_CONF   2:0,0,51:0,0,41:0,0,61:0,0,48:0,0,308:0,1,251:1,1,0.166667:-703.676,-703.676,-35.8875:667.788

I guess another reason this might have been fixed is that instead of using make_prg update to add the denovo sequences into the PRG, we recreate the MSAs and rebuild the PRGs for those genes with novel variants. So this particular case could be a weakness of make_prg update as it just updated with the novel variant - rpoB 1450 G>T - without combining it with the previous position into a single allele.

The second run, ERR4828599, had both a RIF FN and an INH FN. The RIF FN was an interesting case where the isolate has both L449M and S450F. drprg/pandora previously failed to find a novel variant. With the new pandora denovo process, we found (and called) both of these variants. The INH FN was katG S315N, which is a rarer mutation at that locus - normally S315T. Previously both mykrobe and drprg had no depth in this area and drprg did not find a novel variant. With the new pandora, we do find and call this mutation.

I'm going to run a few more of the Illumina FNs, but this is very promising!

mbhall88 commented 1 year ago

Okay, since we had a last update of results we have switch to using racon for denovo discovery and dropped the old nanopore data. I have also increased the number of illumina samples to 8,587

Illumina

Note I am going to change the markers so you can see the error bars now that they are so small

image

Drug Tool FN(R) FP(S) Sensitivity (95% CI) Specificity (95% CI) MCC
Amikacin drprg 77(485) 50(6958) 84.1% (80.6-87.1%) 99.3% (99.1-99.5%) 0.857
Amikacin mykrobe 101(485) 46(6958) 79.2% (75.3-82.6%) 99.3% (99.1-99.5%) 0.831
Amikacin tbprofiler 62(485) 59(6958) 87.2% (83.9-89.9%) 99.2% (98.9-99.3%) 0.866
Capreomycin drprg 62(235) 92(2449) 73.6% (67.6-78.8%) 96.2% (95.4-96.9%) 0.662
Capreomycin mykrobe 78(235) 85(2449) 66.8% (60.6-72.5%) 96.5% (95.7-97.2%) 0.625
Capreomycin tbprofiler 54(235) 96(2449) 77.0% (71.2-81.9%) 96.1% (95.2-96.8%) 0.679
Delamanid drprg 111(116) 1(8152) 4.3% (1.9-9.7%) 100.0% (99.9-100.0%) 0.188
Delamanid mykrobe 111(116) 1(8152) 4.3% (1.9-9.7%) 100.0% (99.9-100.0%) 0.188
Delamanid tbprofiler 111(116) 2(8152) 4.3% (1.9-9.7%) 100.0% (99.9-100.0%) 0.173
Ethambutol drprg 146(1538) 736(4936) 90.5% (88.9-91.9%) 85.1% (84.1-86.1%) 0.685
Ethambutol mykrobe 149(1538) 728(4936) 90.3% (88.7-91.7%) 85.3% (84.2-86.2%) 0.686
Ethambutol tbprofiler 118(1538) 765(4936) 92.3% (90.9-93.6%) 84.5% (83.5-85.5%) 0.691
Ethionamide drprg 341(1104) 372(6105) 69.1% (66.3-71.8%) 93.9% (93.3-94.5%) 0.623
Ethionamide mykrobe 276(1104) 395(6105) 75.0% (72.4-77.5%) 93.5% (92.9-94.1%) 0.658
Ethionamide tbprofiler 272(1104) 414(6105) 75.4% (72.7-77.8%) 93.2% (92.6-93.8%) 0.653
Isoniazid drprg 362(3900) 164(4194) 90.7% (89.8-91.6%) 96.1% (95.5-96.6%) 0.871
Isoniazid mykrobe 366(3900) 163(4194) 90.6% (89.7-91.5%) 96.1% (95.5-96.7%) 0.87
Isoniazid tbprofiler 297(3900) 181(4194) 92.4% (91.5-93.2%) 95.7% (95.0-96.3%) 0.882
Kanamycin drprg 142(670) 101(6975) 78.8% (75.6-81.7%) 98.6% (98.2-98.8%) 0.796
Kanamycin mykrobe 166(670) 96(6975) 75.2% (71.8-78.3%) 98.6% (98.3-98.9%) 0.776
Kanamycin tbprofiler 122(670) 107(6975) 81.8% (78.7-84.5%) 98.5% (98.1-98.7%) 0.811
Levofloxacin drprg 105(1040) 97(5454) 89.9% (87.9-91.6%) 98.2% (97.8-98.5%) 0.884
Levofloxacin mykrobe 108(1040) 97(5454) 89.6% (87.6-91.3%) 98.2% (97.8-98.5%) 0.882
Levofloxacin tbprofiler 85(1040) 109(5454) 91.8% (90.0-93.3%) 98.0% (97.6-98.3%) 0.89
Linezolid drprg 49(65) 4(6110) 24.6% (15.8-36.3%) 99.9% (99.8-100.0%) 0.441
Linezolid mykrobe 49(65) 4(6110) 24.6% (15.8-36.3%) 99.9% (99.8-100.0%) 0.441
Linezolid tbprofiler 48(65) 5(6110) 26.2% (17.0-38.0%) 99.9% (99.8-100.0%) 0.447
Moxifloxacin drprg 60(603) 464(5431) 90.0% (87.4-92.2%) 91.5% (90.7-92.2%) 0.656
Moxifloxacin mykrobe 59(603) 460(5431) 90.2% (87.6-92.3%) 91.5% (90.8-92.2%) 0.658
Moxifloxacin tbprofiler 42(603) 482(5431) 93.0% (90.7-94.8%) 91.1% (90.3-91.9%) 0.668
Ofloxacin drprg 31(105) 4(424) 70.5% (61.2-78.4%) 99.1% (97.6-99.6%) 0.782
Ofloxacin mykrobe 32(105) 4(424) 69.5% (60.2-77.5%) 99.1% (97.6-99.6%) 0.776
Ofloxacin tbprofiler 26(105) 6(424) 75.2% (66.2-82.5%) 98.6% (96.9-99.3%) 0.802
Pyrazinamide drprg 75(341) 47(822) 78.0% (73.3-82.1%) 94.3% (92.5-95.7%) 0.742
Pyrazinamide mykrobe 73(341) 45(822) 78.6% (73.9-82.6%) 94.5% (92.8-95.9%) 0.751
Pyrazinamide tbprofiler 45(341) 62(822) 86.8% (82.8-90.0%) 92.5% (90.4-94.1%) 0.782
Rifampicin drprg 142(3222) 166(4586) 95.6% (94.8-96.2%) 96.4% (95.8-96.9%) 0.919
Rifampicin mykrobe 187(3222) 165(4586) 94.2% (93.3-95.0%) 96.4% (95.8-96.9%) 0.907
Rifampicin tbprofiler 102(3222) 177(4586) 96.8% (96.2-97.4%) 96.1% (95.5-96.7%) 0.927
Streptomycin drprg 278(1042) 130(1205) 73.3% (70.6-75.9%) 89.2% (87.3-90.8%) 0.637
Streptomycin mykrobe 295(1042) 132(1205) 71.7% (68.9-74.3%) 89.0% (87.2-90.7%) 0.621
Streptomycin tbprofiler 257(1042) 136(1205) 75.3% (72.6-77.9%) 88.7% (86.8-90.4%) 0.649

Nanopore

image

Drug Tool FN(R) FP(S) Sensitivity (95% CI) Specificity (95% CI) MCC
Amikacin drprg 0(11) 3(78) 100.0% (74.1-100.0%) 96.2% (89.3-98.7%) 0.869
Amikacin mykrobe 0(11) 3(78) 100.0% (74.1-100.0%) 96.2% (89.3-98.7%) 0.869
Amikacin tbprofiler 0(11) 3(78) 100.0% (74.1-100.0%) 96.2% (89.3-98.7%) 0.869
Capreomycin drprg 1(1) 1(51) 0.0% (0.0-79.3%) 98.0% (89.7-99.7%) -0.02
Capreomycin mykrobe 1(1) 1(51) 0.0% (0.0-79.3%) 98.0% (89.7-99.7%) -0.02
Capreomycin tbprofiler 1(1) 1(51) 0.0% (0.0-79.3%) 98.0% (89.7-99.7%) -0.02
Ethambutol drprg 4(14) 15(77) 71.4% (45.4-88.3%) 80.5% (70.3-87.8%) 0.42
Ethambutol mykrobe 4(14) 15(77) 71.4% (45.4-88.3%) 80.5% (70.3-87.8%) 0.42
Ethambutol tbprofiler 5(14) 15(77) 64.3% (38.8-83.7%) 80.5% (70.3-87.8%) 0.367
Ethionamide drprg 0(4) 1(9) 100.0% (51.0-100.0%) 88.9% (56.5-98.0%) 0.843
Ethionamide mykrobe 0(4) 1(9) 100.0% (51.0-100.0%) 88.9% (56.5-98.0%) 0.843
Ethionamide tbprofiler 0(4) 1(9) 100.0% (51.0-100.0%) 88.9% (56.5-98.0%) 0.843
Isoniazid drprg 9(51) 4(48) 82.4% (69.7-90.4%) 91.7% (80.4-96.7%) 0.742
Isoniazid mykrobe 9(51) 4(48) 82.4% (69.7-90.4%) 91.7% (80.4-96.7%) 0.742
Isoniazid tbprofiler 9(51) 3(48) 82.4% (69.7-90.4%) 93.8% (83.2-97.9%) 0.764
Kanamycin drprg 0(0) 1(52) - 98.1% (89.9-99.7%) -
Kanamycin mykrobe 0(0) 1(52) - 98.1% (89.9-99.7%) -
Kanamycin tbprofiler 0(0) 1(52) - 98.1% (89.9-99.7%) -
Moxifloxacin drprg 0(0) 1(1) - 0.0% (0.0-79.3%) -
Moxifloxacin mykrobe 0(0) 1(1) - 0.0% (0.0-79.3%) -
Moxifloxacin tbprofiler 0(0) 1(1) - 0.0% (0.0-79.3%) -
Ofloxacin drprg 0(10) 4(77) 100.0% (72.2-100.0%) 94.8% (87.4-98.0%) 0.823
Ofloxacin mykrobe 0(10) 4(77) 100.0% (72.2-100.0%) 94.8% (87.4-98.0%) 0.823
Ofloxacin tbprofiler 0(10) 3(77) 100.0% (72.2-100.0%) 96.1% (89.2-98.7%) 0.86
Pyrazinamide drprg 0(0) 0(1) - 100.0% (20.7-100.0%) -
Pyrazinamide mykrobe 0(0) 0(1) - 100.0% (20.7-100.0%) -
Pyrazinamide tbprofiler 0(0) 0(1) - 100.0% (20.7-100.0%) -
Rifampicin drprg 5(48) 1(44) 89.6% (77.8-95.5%) 97.7% (88.2-99.6%) 0.873
Rifampicin mykrobe 5(48) 1(44) 89.6% (77.8-95.5%) 97.7% (88.2-99.6%) 0.873
Rifampicin tbprofiler 5(48) 1(44) 89.6% (77.8-95.5%) 97.7% (88.2-99.6%) 0.873
Streptomycin drprg 2(8) 14(83) 75.0% (40.9-92.9%) 83.1% (73.7-89.7%) 0.398
Streptomycin mykrobe 2(8) 27(83) 75.0% (40.9-92.9%) 67.5% (56.8-76.6%) 0.25
Streptomycin tbprofiler 2(8) 12(83) 75.0% (40.9-92.9%) 85.5% (76.4-91.5%) 0.43
lachlancoin commented 1 year ago

Looks pretty good I think

On Thu, 24 Nov 2022, 1:06 pm Michael Hall, @.***> wrote:

Okay, since we had a last update of results we have switch to using racon for denovo discovery and dropped the old nanopore data. I have also increased the number of illumina samples to 8,587 Illumina

Note I am going to change the markers so you can see the error bars now that they are so small

[image: image] https://user-images.githubusercontent.com/20403931/203677683-39591450-c68e-47a8-9b2c-d77f511c5b0e.png Drug Tool FN(R) FP(S) Sensitivity (95% CI) Specificity (95% CI) MCC Amikacin drprg 77(485) 50(6958) 84.1% (80.6-87.1%) 99.3% (99.1-99.5%) 0.857 Amikacin mykrobe 101(485) 46(6958) 79.2% (75.3-82.6%) 99.3% (99.1-99.5%) 0.831 Amikacin tbprofiler 62(485) 59(6958) 87.2% (83.9-89.9%) 99.2% (98.9-99.3%) 0.866 Capreomycin drprg 62(235) 92(2449) 73.6% (67.6-78.8%) 96.2% (95.4-96.9%) 0.662 Capreomycin mykrobe 78(235) 85(2449) 66.8% (60.6-72.5%) 96.5% (95.7-97.2%) 0.625 Capreomycin tbprofiler 54(235) 96(2449) 77.0% (71.2-81.9%) 96.1% (95.2-96.8%) 0.679 Delamanid drprg 111(116) 1(8152) 4.3% (1.9-9.7%) 100.0% (99.9-100.0%) 0.188 Delamanid mykrobe 111(116) 1(8152) 4.3% (1.9-9.7%) 100.0% (99.9-100.0%) 0.188 Delamanid tbprofiler 111(116) 2(8152) 4.3% (1.9-9.7%) 100.0% (99.9-100.0%) 0.173 Ethambutol drprg 146(1538) 736(4936) 90.5% (88.9-91.9%) 85.1% (84.1-86.1%) 0.685 Ethambutol mykrobe 149(1538) 728(4936) 90.3% (88.7-91.7%) 85.3% (84.2-86.2%) 0.686 Ethambutol tbprofiler 118(1538) 765(4936) 92.3% (90.9-93.6%) 84.5% (83.5-85.5%) 0.691 Ethionamide drprg 341(1104) 372(6105) 69.1% (66.3-71.8%) 93.9% (93.3-94.5%) 0.623 Ethionamide mykrobe 276(1104) 395(6105) 75.0% (72.4-77.5%) 93.5% (92.9-94.1%) 0.658 Ethionamide tbprofiler 272(1104) 414(6105) 75.4% (72.7-77.8%) 93.2% (92.6-93.8%) 0.653 Isoniazid drprg 362(3900) 164(4194) 90.7% (89.8-91.6%) 96.1% (95.5-96.6%) 0.871 Isoniazid mykrobe 366(3900) 163(4194) 90.6% (89.7-91.5%) 96.1% (95.5-96.7%) 0.87 Isoniazid tbprofiler 297(3900) 181(4194) 92.4% (91.5-93.2%) 95.7% (95.0-96.3%) 0.882 Kanamycin drprg 142(670) 101(6975) 78.8% (75.6-81.7%) 98.6% (98.2-98.8%) 0.796 Kanamycin mykrobe 166(670) 96(6975) 75.2% (71.8-78.3%) 98.6% (98.3-98.9%) 0.776 Kanamycin tbprofiler 122(670) 107(6975) 81.8% (78.7-84.5%) 98.5% (98.1-98.7%) 0.811 Levofloxacin drprg 105(1040) 97(5454) 89.9% (87.9-91.6%) 98.2% (97.8-98.5%) 0.884 Levofloxacin mykrobe 108(1040) 97(5454) 89.6% (87.6-91.3%) 98.2% (97.8-98.5%) 0.882 Levofloxacin tbprofiler 85(1040) 109(5454) 91.8% (90.0-93.3%) 98.0% (97.6-98.3%) 0.89 Linezolid drprg 49(65) 4(6110) 24.6% (15.8-36.3%) 99.9% (99.8-100.0%) 0.441 Linezolid mykrobe 49(65) 4(6110) 24.6% (15.8-36.3%) 99.9% (99.8-100.0%) 0.441 Linezolid tbprofiler 48(65) 5(6110) 26.2% (17.0-38.0%) 99.9% (99.8-100.0%) 0.447 Moxifloxacin drprg 60(603) 464(5431) 90.0% (87.4-92.2%) 91.5% (90.7-92.2%) 0.656 Moxifloxacin mykrobe 59(603) 460(5431) 90.2% (87.6-92.3%) 91.5% (90.8-92.2%) 0.658 Moxifloxacin tbprofiler 42(603) 482(5431) 93.0% (90.7-94.8%) 91.1% (90.3-91.9%) 0.668 Ofloxacin drprg 31(105) 4(424) 70.5% (61.2-78.4%) 99.1% (97.6-99.6%) 0.782 Ofloxacin mykrobe 32(105) 4(424) 69.5% (60.2-77.5%) 99.1% (97.6-99.6%) 0.776 Ofloxacin tbprofiler 26(105) 6(424) 75.2% (66.2-82.5%) 98.6% (96.9-99.3%) 0.802 Pyrazinamide drprg 75(341) 47(822) 78.0% (73.3-82.1%) 94.3% (92.5-95.7%) 0.742 Pyrazinamide mykrobe 73(341) 45(822) 78.6% (73.9-82.6%) 94.5% (92.8-95.9%) 0.751 Pyrazinamide tbprofiler 45(341) 62(822) 86.8% (82.8-90.0%) 92.5% (90.4-94.1%) 0.782 Rifampicin drprg 142(3222) 166(4586) 95.6% (94.8-96.2%) 96.4% (95.8-96.9%) 0.919 Rifampicin mykrobe 187(3222) 165(4586) 94.2% (93.3-95.0%) 96.4% (95.8-96.9%) 0.907 Rifampicin tbprofiler 102(3222) 177(4586) 96.8% (96.2-97.4%) 96.1% (95.5-96.7%) 0.927 Streptomycin drprg 278(1042) 130(1205) 73.3% (70.6-75.9%) 89.2% (87.3-90.8%) 0.637 Streptomycin mykrobe 295(1042) 132(1205) 71.7% (68.9-74.3%) 89.0% (87.2-90.7%) 0.621 Streptomycin tbprofiler 257(1042) 136(1205) 75.3% (72.6-77.9%) 88.7% (86.8-90.4%) 0.649 Nanopore

[image: image] https://user-images.githubusercontent.com/20403931/203677879-e90cc0ce-d034-4cfb-a49f-85b72afca86b.png Drug Tool FN(R) FP(S) Sensitivity (95% CI) Specificity (95% CI) MCC Amikacin drprg 0(11) 3(78) 100.0% (74.1-100.0%) 96.2% (89.3-98.7%) 0.869 Amikacin mykrobe 0(11) 3(78) 100.0% (74.1-100.0%) 96.2% (89.3-98.7%) 0.869 Amikacin tbprofiler 0(11) 3(78) 100.0% (74.1-100.0%) 96.2% (89.3-98.7%) 0.869 Capreomycin drprg 1(1) 1(51) 0.0% (0.0-79.3%) 98.0% (89.7-99.7%) -0.02 Capreomycin mykrobe 1(1) 1(51) 0.0% (0.0-79.3%) 98.0% (89.7-99.7%) -0.02 Capreomycin tbprofiler 1(1) 1(51) 0.0% (0.0-79.3%) 98.0% (89.7-99.7%) -0.02 Ethambutol drprg 4(14) 15(77) 71.4% (45.4-88.3%) 80.5% (70.3-87.8%) 0.42 Ethambutol mykrobe 4(14) 15(77) 71.4% (45.4-88.3%) 80.5% (70.3-87.8%) 0.42 Ethambutol tbprofiler 5(14) 15(77) 64.3% (38.8-83.7%) 80.5% (70.3-87.8%) 0.367 Ethionamide drprg 0(4) 1(9) 100.0% (51.0-100.0%) 88.9% (56.5-98.0%) 0.843 Ethionamide mykrobe 0(4) 1(9) 100.0% (51.0-100.0%) 88.9% (56.5-98.0%) 0.843 Ethionamide tbprofiler 0(4) 1(9) 100.0% (51.0-100.0%) 88.9% (56.5-98.0%) 0.843 Isoniazid drprg 9(51) 4(48) 82.4% (69.7-90.4%) 91.7% (80.4-96.7%) 0.742 Isoniazid mykrobe 9(51) 4(48) 82.4% (69.7-90.4%) 91.7% (80.4-96.7%) 0.742 Isoniazid tbprofiler 9(51) 3(48) 82.4% (69.7-90.4%) 93.8% (83.2-97.9%) 0.764 Kanamycin drprg 0(0) 1(52) - 98.1% (89.9-99.7%) - Kanamycin mykrobe 0(0) 1(52) - 98.1% (89.9-99.7%) - Kanamycin tbprofiler 0(0) 1(52) - 98.1% (89.9-99.7%) - Moxifloxacin drprg 0(0) 1(1) - 0.0% (0.0-79.3%) - Moxifloxacin mykrobe 0(0) 1(1) - 0.0% (0.0-79.3%) - Moxifloxacin tbprofiler 0(0) 1(1) - 0.0% (0.0-79.3%) - Ofloxacin drprg 0(10) 4(77) 100.0% (72.2-100.0%) 94.8% (87.4-98.0%) 0.823 Ofloxacin mykrobe 0(10) 4(77) 100.0% (72.2-100.0%) 94.8% (87.4-98.0%) 0.823 Ofloxacin tbprofiler 0(10) 3(77) 100.0% (72.2-100.0%) 96.1% (89.2-98.7%) 0.86 Pyrazinamide drprg 0(0) 0(1) - 100.0% (20.7-100.0%) - Pyrazinamide mykrobe 0(0) 0(1) - 100.0% (20.7-100.0%) - Pyrazinamide tbprofiler 0(0) 0(1) - 100.0% (20.7-100.0%) - Rifampicin drprg 5(48) 1(44) 89.6% (77.8-95.5%) 97.7% (88.2-99.6%) 0.873 Rifampicin mykrobe 5(48) 1(44) 89.6% (77.8-95.5%) 97.7% (88.2-99.6%) 0.873 Rifampicin tbprofiler 5(48) 1(44) 89.6% (77.8-95.5%) 97.7% (88.2-99.6%) 0.873 Streptomycin drprg 2(8) 14(83) 75.0% (40.9-92.9%) 83.1% (73.7-89.7%) 0.398 Streptomycin mykrobe 2(8) 27(83) 75.0% (40.9-92.9%) 67.5% (56.8-76.6%) 0.25 Streptomycin tbprofiler 2(8) 12(83) 75.0% (40.9-92.9%) 85.5% (76.4-91.5%) 0.43

— Reply to this email directly, view it on GitHub https://github.com/mbhall88/drprg-paper/issues/2, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA6TKZC4MDZPDVRG56HLV7DWJ3ESVANCNFSM6AAAAAAQWLHP7A . You are receiving this because you commented.Message ID: @.***>

iqbal-lab commented 1 year ago

Yeah!

mbhall88 commented 1 year ago

I disagree sadly haha. TBProfiler is beating us on a lot of drugs. Going to dig into why that is now

iqbal-lab commented 1 year ago

Looking again (now on laptop), if i was to summarise those results:

On illumina, tb-profiler often has the highest sensitivity. It does pay a very small price in specificity, but it's much less noticeable than the sensitivity increase. So i agree, good to look into that

On nanopore: sensitivity of all tools is essentially identical (except tb-profiler has a problem on EMB). Specificity is also essentially identical, although for two drugs (streptomycin and ofloxacin) tb-prof has a slightly increased specificity. I'm quite impressed/surprised how well all 3 do on the 4 drugs where any frameshift in a gene causes a resistant call. It matches what you found for Mykrobe in your Lancet Microbe paper @mbhall88 , but am delighted it's also true for DrPrg; also a bit surprised that tb-profiler does that well too given it uses bcftools. We didn't find we could call indels with this level of specificity. (I guess, just refusing to make indel calls with nanopore would give v high specificity?)

mbhall88 commented 1 year ago

I've been looking through the variants where drprg is FN but either of the other tools is TP (on Illumina) to see what variants we have missed. (I'm not finished yet) but a lot of the tbprofiler TPs where we are FN are to do with minor alleles. By default, TBProfiler will call anything with a fraction of 0.1 or more. This brings up point 3 from https://github.com/mbhall88/drprg-paper/issues/2 again. We tell mykrobe to run in haploid mode and drprg only runs in haploid mode. The options forward I see are:

  1. Run mykrobe in diploid mode (call minor alleles) and take the sensitivity hit in drprg as we cant call minor alleles
  2. force fraction 0.5 in tbprofiler so that all tools are effectively in haploid mode
  3. implement a diploid model in pandora (not sure how much work this would be? will alert Leandro to get his input too)

Option 2 is obviously the easiest and most likely to make us look better, but it sits somewhat uncomfortably with me as we are kind of skewing the results in our favour right?

I will keep working through these results next week for other drugs as there are also a few cases on weird indels which I will document when I have a better understanding of what's going on.

iqbal-lab commented 1 year ago

I think detecting minors could easily be done directly in drprg, no need to implement in Pandora. You get coverage info on the S and R alleles right? Just ask if the coverage on any R allele is >0.1 of the total

leoisl commented 1 year ago

3. implement a diploid model in pandora (not sure how much work this would be? will alert Leandro to get his input too)

IDK neither how much work this would be, because the only experience I have with genotyping models actually is in pandora, which has a haploid model. If implementing a diploid model is simply calling the two most likely alleles, then maybe a simple implementation of getting the most likely allele (what is currently implemented) and the second most likely allele (remove/ignore the most likely and rerun the genotyping algorithm) is not hard. This can be easily generalised to n-ploid... but I don't think it is as simple as this...

iqbal-lab commented 1 year ago
  1. The problem is its not really diploid. Minor alleles in bacteria sometimes occur at a few (or many) placrs across the genome,but at different frequencies. Diploid assumes 50/50. Mykrobe uses a kind if diploid model, but it's a hack, and the genotype confidence is not well calibrated.
  2. In Pandora, there are a bunch of things you could do, but you're describing whole genome variation. Maybe you'd say "looks like there is a mixture of 2 genomes at 70:30 ratio". Or "lots of mixed positions in this data, looks dodgy".
  3. In Mykrobe and drprg, you have positions you care about, and knowledge that low frequency Resistance alleles cause drug resistance. So you just need to spot them, independently, at each snp. Pandora makes a vcf which I believe drprg parses here (https://github.com/mbhall88/drprg/blob/265c25c9e027a26f8c671931a736e19da399142e/src/predict.rs#L402) The vcf has coverage on both alleles, or all alleles. So you can just parse that to spot minors.
mbhall88 commented 1 year ago

I think detecting minors could easily be done directly in drprg, no need to implement in Pandora. You get coverage info on the S and R alleles right? Just ask if the coverage on any R allele is >0.1 of the total

True. I'll have to do some reimplementing though as I currently only pay attention to the called alleles. But it shouldn't take too long to get this working 🤞

iqbal-lab commented 1 year ago

Hurrah! I would reread the section on minor alleles here https://wellcomeopenresearch.org/articles/4-191 I just reread it and it was informative, reminded me of differences between drugs

mbhall88 commented 1 year ago

See https://github.com/mbhall88/drprg/issues/19#issuecomment-1345391531 for the latest results after adding minor allele calling

mbhall88 commented 1 year ago

After updating pandora and make_prg, as well as implementing gene deletion detection, we have the following Illumina results

Drug Tool FN(R) FP(S) Sensitivity (95% CI) Specificity (95% CI) MCC
Amikacin drprg 68(485) 57(6958) 86.0% (82.6-88.8%) 99.2% (98.9-99.4%) 0.861
Amikacin mykrobe 93(485) 51(6958) 80.8% (77.1-84.1%) 99.3% (99.0-99.4%) 0.836
Amikacin tbprofiler 62(485) 59(6958) 87.2% (83.9-89.9%) 99.2% (98.9-99.3%) 0.866
Capreomycin drprg 57(235) 95(2449) 75.7% (69.9-80.8%) 96.1% (95.3-96.8%) 0.672
Capreomycin mykrobe 72(235) 88(2449) 69.4% (63.2-74.9%) 96.4% (95.6-97.1%) 0.638
Capreomycin tbprofiler 54(235) 96(2449) 77.0% (71.2-81.9%) 96.1% (95.2-96.8%) 0.679
Delamanid drprg 111(116) 1(8152) 4.3% (1.9-9.7%) 100.0% (99.9-100.0%) 0.188
Delamanid mykrobe 111(116) 2(8152) 4.3% (1.9-9.7%) 100.0% (99.9-100.0%) 0.173
Delamanid tbprofiler 111(116) 2(8152) 4.3% (1.9-9.7%) 100.0% (99.9-100.0%) 0.173
Ethambutol drprg 122(1538) 750(4936) 92.1% (90.6-93.3%) 84.8% (83.8-85.8%) 0.693
Ethambutol mykrobe 133(1538) 747(4936) 91.4% (89.8-92.7%) 84.9% (83.8-85.8%) 0.689
Ethambutol tbprofiler 118(1538) 765(4936) 92.3% (90.9-93.6%) 84.5% (83.5-85.5%) 0.691
Ethionamide drprg 325(1104) 395(6105) 70.6% (67.8-73.2%) 93.5% (92.9-94.1%) 0.625
Ethionamide mykrobe 265(1104) 413(6105) 76.0% (73.4-78.4%) 93.2% (92.6-93.8%) 0.658
Ethionamide tbprofiler 272(1104) 414(6105) 75.4% (72.7-77.8%) 93.2% (92.6-93.8%) 0.653
Isoniazid drprg 307(3900) 173(4194) 92.1% (91.2-92.9%) 95.9% (95.2-96.4%) 0.882
Isoniazid mykrobe 333(3900) 170(4194) 91.5% (90.5-92.3%) 95.9% (95.3-96.5%) 0.876
Isoniazid tbprofiler 297(3900) 181(4194) 92.4% (91.5-93.2%) 95.7% (95.0-96.3%) 0.882
Kanamycin drprg 128(670) 107(6975) 80.9% (77.7-83.7%) 98.5% (98.1-98.7%) 0.805
Kanamycin mykrobe 152(670) 98(6975) 77.3% (74.0-80.3%) 98.6% (98.3-98.8%) 0.789
Kanamycin tbprofiler 122(670) 107(6975) 81.8% (78.7-84.5%) 98.5% (98.1-98.7%) 0.811
Levofloxacin drprg 81(1040) 103(5454) 92.2% (90.4-93.7%) 98.1% (97.7-98.4%) 0.896
Levofloxacin mykrobe 88(1040) 102(5454) 91.5% (89.7-93.1%) 98.1% (97.7-98.5%) 0.892
Levofloxacin tbprofiler 85(1040) 109(5454) 91.8% (90.0-93.3%) 98.0% (97.6-98.3%) 0.89
Linezolid drprg 49(65) 4(6110) 24.6% (15.8-36.3%) 99.9% (99.8-100.0%) 0.441
Linezolid mykrobe 48(65) 4(6110) 26.2% (17.0-38.0%) 99.9% (99.8-100.0%) 0.457
Linezolid tbprofiler 48(65) 5(6110) 26.2% (17.0-38.0%) 99.9% (99.8-100.0%) 0.447
Moxifloxacin drprg 41(603) 480(5431) 93.2% (90.9-94.9%) 91.2% (90.4-91.9%) 0.669
Moxifloxacin mykrobe 44(603) 473(5431) 92.7% (90.3-94.5%) 91.3% (90.5-92.0%) 0.669
Moxifloxacin tbprofiler 42(603) 482(5431) 93.0% (90.7-94.8%) 91.1% (90.3-91.9%) 0.668
Ofloxacin drprg 24(105) 5(424) 77.1% (68.2-84.1%) 98.8% (97.3-99.5%) 0.821
Ofloxacin mykrobe 26(105) 5(424) 75.2% (66.2-82.5%) 98.8% (97.3-99.5%) 0.808
Ofloxacin tbprofiler 26(105) 6(424) 75.2% (66.2-82.5%) 98.6% (96.9-99.3%) 0.802
Pyrazinamide drprg 70(341) 54(822) 79.5% (74.9-83.4%) 93.4% (91.5-94.9%) 0.74
Pyrazinamide mykrobe 55(341) 56(822) 83.9% (79.6-87.4%) 93.2% (91.3-94.7%) 0.77
Pyrazinamide tbprofiler 45(341) 62(822) 86.8% (82.8-90.0%) 92.5% (90.4-94.1%) 0.782
Rifampicin drprg 138(3222) 167(4586) 95.7% (95.0-96.4%) 96.4% (95.8-96.9%) 0.92
Rifampicin mykrobe 164(3222) 169(4586) 94.9% (94.1-95.6%) 96.3% (95.7-96.8%) 0.912
Rifampicin tbprofiler 102(3222) 177(4586) 96.8% (96.2-97.4%) 96.1% (95.5-96.7%) 0.927
Streptomycin drprg 265(1042) 133(1205) 74.6% (71.8-77.1%) 89.0% (87.1-90.6%) 0.645
Streptomycin mykrobe 282(1042) 135(1205) 72.9% (70.2-75.5%) 88.8% (86.9-90.5%) 0.629
Streptomycin tbprofiler 257(1042) 136(1205) 75.3% (72.6-77.9%) 88.7% (86.8-90.4%) 0.649

illumina

iqbal-lab commented 1 year ago

I do find the tbprofiler sensitivity suspicious, and suspect

  1. An error in our probes or
  2. A phenotyping error
mbhall88 commented 1 year ago

Any drugs in particular?

iqbal-lab commented 1 year ago

Isoniazid and rifampicin really

mbhall88 commented 1 year ago

I've been through all of the drprg PZA FNs that are called by at least one other tool.

There are two overarching problems drprg has

  1. A lot of the missed calls are minor allele calls for variants not covered by anything in the PRG. So, because they're minor alleles, they don't get discovered as novel. THe pncA PRG is quite sparse so it might be worth us adding some more PZA-resistant isolates to the reference PRG to try and capture more of the popn. variation. And where the minor alleles are covered by the PRG they seem to fail the GAPS threshold of 0.3
  2. There are some big deletions that knock out the start codon, and some. We (surprisingly) discover the deletion, but get no coverage on it (or the ref)
    pncA    1       .       GTCATGTTCGCGATCGTCGCGGCGTCATGGACCCTATATCTGTGGCTGCCGCGTCGGTAGGCAAACTGCCCGGGCAGTCGCCCGAACGTATGGTGGACGTATGCGGGCGTTGATCATCGTCGACGTGCAGAACGACTTCTGCGAGGGTGGCTCGCTGGCGGTAACCGGTGGCGCCGCGCTGGCCCGCGCCATCAGCGACTACCTGGCCGAAGCGGCGGACTACCATCACGTCGTGGCAACCAAGGACTTCCACATCGACCCGGGTGACCACTTCTCCGGCACACCGGACTATTCCT        G,GTCATGTTCGCGATCGTCGCGGCGTCATGGACCCTATATCTGTGGCTGCCGCGTCGGTAGGCAAACTGCCCGGGCAGTCGCCCGAACGTATGGTGGACGTATGCGGGCGTTGATCATCGTCGACGTGCAGAACGACTGACTTCTGCGAGGGTGGCTCGCTGGCGGTAACCGGTGGCGCCGCGCTGGCCCGCGCCATCAGCGACTACCTGGCCGAAGCGGCGGACTACCATCACGTCGTGGCAACCAAGGACTTCCACATCGACCCGGGTGACCACTTCTCCGGCACACCGGACTATTCCT,GTCATGTTCGCGATCGTCGCGGCGTCATGGACCCTATATCTGTGGCTGCCGCGTCGGTAGGCAAACTGCCCGGGCAGTCGCCCGAACGTATGGTGGACGTATGCGGGCGTTGATCATCGTCGACGTGCAGAACGACTTCTGCGAGGGTGGCTCGCGGGCGGTAACCGGTGGCGCCGCGCTGGCCCGCGCCATCAGCGACTACCTGGCCGAAGCGGCGGACTACCATCACGTCGTGGCAACCAAGGACTTCCACATCGACCCGGGTGACCACTTCTCCGGCACACCGGACTATATCTT,GTCATGTTCGCGATCGTCGCGGCGTCATGGACCCTATATCTGTGGCTGCCGCGTCGGTAGGCAAACTGCCCGGGCAGTCGCCCGAACGTATGGTGGACGTATGCGGGCGTTGATCATCGTCGACGTGCAGAACGACTTCTGCGAGGGTGGCTCGCTGGCGGTAACCGGTGGCGCCGCGCCGGCCCGCGCCATCAGCGACTACCTGGCCGAAGCGGCGGACTACCATCACGTCGTGGCAACCAAGGACTTCCACATCGACCCGGGTGACCACTTCTCCGGCACACCGGACTATTCCT,GTCATGTTCGCGATCGTCGCGGCGTCATGGACCCTATATCTGTGGCTGCCGCGTCGGTAGGCAAACTGCCCGGGCAGTCGCCCGAACGTATGGTGGACGTATGCGGGCGTTGATCATCGTCGACGTGCAGAACGACTTCTGCGAGGGTGGCTCGCTGGCGGTAACCGGTGGCGCCGCGCTGGCCCGCGCCATCAGCGACTACCTGGCCGAAGCGGCGGACTACCATCACGTCGTGGCAACCAAGGACTTCCACATCGACCCGGGTGACCACCTCTCCGGCACACCGGACTATTCCT,GTCATGTTCGCGATCGTCGCGGCGTCATGGACCCTATATCTGTGGCTGCCGCGTCGGTAGGCAAACTGCCCGGGCAGTCGCCCGAACGTATGGTGGACGTATGCGGGCGTTGATCATCGTCGACGTGCAGAACGACTTCTGCGAGGGTGGCTCGCTGGCGGTAACCGGTGGCGCCGCGCTGGCCCGCGCCATCAGCGACTACCTGGCCGAAGCGGCGGACTACCATCACGTCGTGGCAACCAAGGACTTCCACATCGACCCGGGTGACCACTTCTCCGGCACACCGGACTATATCTT,GTCATGTTCGCGATCGTCGCGGCGTCATGGACCCTATATCTGTGGCTGCCGCGTCGGTAGGCAAACTGCCCGGGCAGTCGCCCGAACGTATGGTGGACGTATGCGGGCGTTGATCATCGTCGACGTGCAGAACGACTTCTGCGAGGGTGGCTCGCTGGCGGTAACCGGTGGCGCCGCGCTGGCCCGCGCCATCAGCGACTACCTGGCCGAAGCGGCGGACTACCATCACGTCGTGGCAACCAAGGACTTCCACATCGACCCGGGTGACGACTTCTCCGGCACACCGGACTATTCCT,GTCATGTTCGCGATCGTCGCGGCGTCATGGACCCTATATCTGTGGCTGCCGCGTCGGTAGGCAAACTGCCCGGGCAGTCGCCCGAACGTATGGTGGACGTATGCGGGCGTTGATCATCGTCGACGTGCAGAACGACTTCTGCGAGGGTGGCTCGCTGGCGGTAACCGGTGGCGCCGCGCTGGCCCGCGCCATCAGCGACTACCTGGCCGAAGCGGCGGACTACCATCACGTCGTGGCAACCAAGGACTTCCACATCGACCCGGGTGACTACTTCTCCGGCACACCGGACTATTCCT,GTCATGTTCGCGATCGTCGCGGCGTCATGGACCCTATATCTGTGGCTGCCGCGTCGGTAGGCAAACTGCCCGGGCAGTCGCCCGAACGTATGGTGGACGTATGCGGGCGTTGATCATCGTCGACGTGCCGAACGACTTCTGCGAGGGTGGCTCGCTGGCGGTAACCGGTGGCGCCGCGCTGGCCCGCGCCATCAGCGACTACCTGGCCGAAGCGGCGGACTACCATCACGTCGTGGCAACCAAGGACTTCCACATCGACCCGGGTGACCACTTCTCCGGCACACCGGACTATTCCT .       .       VC=INDEL;GRAPHTYPE=SIMPLE       GT:MEAN_FWD_COVG:MEAN_REV_COVG:MED_FWD_COVG:MED_REV_COVG:SUM_FWD_COVG:SUM_REV_COVG:GAPS:LIKELIHOOD:GT_CONF      .:0,0,0,0,0,0,0,0,0,0:0,0,0,0,0,0,0,0,0,0:0,0,0,0,0,0,0,0,0,0:0,0,0,0,0,0,0,0,0,0:0,0,0,0,0,0,0,0,0,0:0,0,0,0,0,0,0,0,0,0:1,1,1,1,1,1,1,1,1,1:-488,-488,-488,-488,-488,-488,-488,-488,-488,-488:0

One way around this could be to notice when we have more than n consecutive VCF entries with a failed/null call and just call resistant? Or, to be more precise, notice when we have a failed position(s) that spans the start codon and then call resistant if it is one of the genes where gene deletion causes resistance.

mbhall88 commented 1 year ago

RIF FNs

These are all minor alleles where we don't call anything because we don't have anything in the graph to allow us to notice the minor alleles

Here are the mutations and the number of samples with FNs for those mutations

L430P: 2
Q432P: 2
D435V: 2
D435N: 1
H445Y: 1
H445N: 1
H445D: 3
S450L: 23
L452P: 5

The only way we can avoid this is adding some samples into the graph that contain all (most?) of these mutations.

iqbal-lab commented 1 year ago

Two solutions

  1. These minor alleles must by definition be catalogue snps. So they are in our graph, but the problem is that racon has found something new in our consensus, and that is close to the minor allele and the combination of both is not in the graph. Right? So all we need to do is rebuild the prg including catalogue and also the racon call?
  2. There is a clean solution the requires a fair bit of new code. At the end of drprg take the consensus and align to the reference so we know what positions in the consensus correspond to catalogue snp coords. Then minimap the reads to the consensus (I see a rust port of minimap exists, but looks maybe immature ? https://crates.io/crates/minimap2 ). Then just count minor allele bases in the pile up at the catalogue snp coords.
iqbal-lab commented 1 year ago

This is v exciting and good news really, there a lot of sensitivity gain to be had from the minor alleles and gene deletions

mbhall88 commented 1 year ago

For point 1, no, that is not right. We don't have these variants in the graph, which is the problem. (Remember our graph is not the panel, but the sparse popn. PRG from randomly sampled cryptic samples). And racon can't find them in these samples beacsue they're only minor alleles. Racon will find the major allele - the reference.

Point 2 seems like it effectively does away with the need for pandora though - is almost basically what tbprofiler does? It will also dramatically increase our runtime and memory usage, which at the moment is our biggest selling point really.

iqbal-lab commented 1 year ago
  1. Ah, I did forget
  2. No, it doesn't at all do away with the need for Pandora, mapping to the consensus will be dominated by exact matches (for illumina), so much easier to spot minors

I need to think!

iqbal-lab commented 1 year ago

Follow up to 2. I'm not pushing for this solution, but just to say, we do this for covid, 30kb long, and use <500 mb ram and 45 seconds for the whole process. I think performance is not a barrier . But there are other,arguments not to do it

mbhall88 commented 1 year ago

After closing mbhall88/drprg#23 the current (Illumina) results are

Drug Tool FN(R) FP(S) Sensitivity (95% CI) Specificity (95% CI) MCC
Amikacin drprg 68(484) 57(6958) 86.0% (82.6-88.8%) 99.2% (98.9-99.4%) 0.86
Amikacin mykrobe 93(484) 51(6958) 80.8% (77.0-84.0%) 99.3% (99.0-99.4%) 0.835
Amikacin tbprofiler 62(484) 59(6958) 87.2% (83.9-89.9%) 99.2% (98.9-99.3%) 0.866
Capreomycin drprg 57(235) 94(2448) 75.7% (69.9-80.8%) 96.2% (95.3-96.9%) 0.673
Capreomycin mykrobe 72(235) 87(2448) 69.4% (63.2-74.9%) 96.4% (95.6-97.1%) 0.64
Capreomycin tbprofiler 54(235) 95(2448) 77.0% (71.2-81.9%) 96.1% (95.3-96.8%) 0.681
Delamanid drprg 111(116) 5(8151) 4.3% (1.9-9.7%) 99.9% (99.9-100.0%) 0.144
Delamanid mykrobe 111(116) 2(8151) 4.3% (1.9-9.7%) 100.0% (99.9-100.0%) 0.173
Delamanid tbprofiler 111(116) 2(8151) 4.3% (1.9-9.7%) 100.0% (99.9-100.0%) 0.173
Ethambutol drprg 121(1537) 752(4935) 92.1% (90.7-93.4%) 84.8% (83.7-85.7%) 0.693
Ethambutol mykrobe 133(1537) 747(4935) 91.3% (89.8-92.7%) 84.9% (83.8-85.8%) 0.688
Ethambutol tbprofiler 118(1537) 765(4935) 92.3% (90.9-93.6%) 84.5% (83.5-85.5%) 0.691
Ethionamide drprg 273(1103) 417(6105) 75.2% (72.6-77.7%) 93.2% (92.5-93.8%) 0.651
Ethionamide mykrobe 265(1103) 413(6105) 76.0% (73.4-78.4%) 93.2% (92.6-93.8%) 0.658
Ethionamide tbprofiler 272(1103) 414(6105) 75.3% (72.7-77.8%) 93.2% (92.6-93.8%) 0.653
Isoniazid drprg 307(3899) 173(4193) 92.1% (91.2-92.9%) 95.9% (95.2-96.4%) 0.882
Isoniazid mykrobe 333(3899) 170(4193) 91.5% (90.5-92.3%) 95.9% (95.3-96.5%) 0.876
Isoniazid tbprofiler 297(3899) 181(4193) 92.4% (91.5-93.2%) 95.7% (95.0-96.3%) 0.882
Kanamycin drprg 128(669) 107(6975) 80.9% (77.7-83.7%) 98.5% (98.1-98.7%) 0.805
Kanamycin mykrobe 152(669) 98(6975) 77.3% (74.0-80.3%) 98.6% (98.3-98.8%) 0.788
Kanamycin tbprofiler 122(669) 107(6975) 81.8% (78.7-84.5%) 98.5% (98.1-98.7%) 0.811
Levofloxacin drprg 81(1040) 102(5454) 92.2% (90.4-93.7%) 98.1% (97.7-98.5%) 0.896
Levofloxacin mykrobe 88(1040) 102(5454) 91.5% (89.7-93.1%) 98.1% (97.7-98.5%) 0.892
Levofloxacin tbprofiler 85(1040) 109(5454) 91.8% (90.0-93.3%) 98.0% (97.6-98.3%) 0.89
Linezolid drprg 48(65) 4(6109) 26.2% (17.0-38.0%) 99.9% (99.8-100.0%) 0.457
Linezolid mykrobe 48(65) 4(6109) 26.2% (17.0-38.0%) 99.9% (99.8-100.0%) 0.457
Linezolid tbprofiler 48(65) 5(6109) 26.2% (17.0-38.0%) 99.9% (99.8-100.0%) 0.447
Moxifloxacin drprg 41(603) 478(5430) 93.2% (90.9-94.9%) 91.2% (90.4-91.9%) 0.67
Moxifloxacin mykrobe 44(603) 472(5430) 92.7% (90.3-94.5%) 91.3% (90.5-92.0%) 0.669
Moxifloxacin tbprofiler 42(603) 481(5430) 93.0% (90.7-94.8%) 91.1% (90.4-91.9%) 0.668
Ofloxacin drprg 24(104) 5(424) 76.9% (68.0-84.0%) 98.8% (97.3-99.5%) 0.82
Ofloxacin mykrobe 26(104) 5(424) 75.0% (65.9-82.3%) 98.8% (97.3-99.5%) 0.807
Ofloxacin tbprofiler 26(104) 6(424) 75.0% (65.9-82.3%) 98.6% (96.9-99.3%) 0.8
Pyrazinamide drprg 68(341) 53(820) 80.1% (75.5-84.0%) 93.5% (91.6-95.0%) 0.746
Pyrazinamide mykrobe 55(341) 56(820) 83.9% (79.6-87.4%) 93.2% (91.2-94.7%) 0.77
Pyrazinamide tbprofiler 45(341) 62(820) 86.8% (82.8-90.0%) 92.4% (90.4-94.1%) 0.782
Rifampicin drprg 133(3221) 166(4585) 95.9% (95.1-96.5%) 96.4% (95.8-96.9%) 0.921
Rifampicin mykrobe 164(3221) 169(4585) 94.9% (94.1-95.6%) 96.3% (95.7-96.8%) 0.912
Rifampicin tbprofiler 102(3221) 177(4585) 96.8% (96.2-97.4%) 96.1% (95.5-96.7%) 0.927
Streptomycin drprg 266(1041) 133(1205) 74.4% (71.7-77.0%) 89.0% (87.1-90.6%) 0.644
Streptomycin mykrobe 282(1041) 135(1205) 72.9% (70.1-75.5%) 88.8% (86.9-90.5%) 0.629
Streptomycin tbprofiler 257(1041) 136(1205) 75.3% (72.6-77.8%) 88.7% (86.8-90.4%) 0.649

illumina

The nanopore results remain unchanged

mbhall88 commented 1 year ago

After the updates in minor allele calling in https://github.com/mbhall88/drprg/issues/19#issuecomment-1371473290

Drug Tool FN(R) FP(S) Sensitivity (95% CI) Specificity (95% CI) MCC
Amikacin drprg 68(484) 57(6958) 86.0% (82.6-88.8%) 99.2% (98.9-99.4%) 0.86
Amikacin mykrobe 93(484) 51(6958) 80.8% (77.0-84.0%) 99.3% (99.0-99.4%) 0.835
Amikacin tbprofiler 62(484) 59(6958) 87.2% (83.9-89.9%) 99.2% (98.9-99.3%) 0.866
Capreomycin drprg 57(235) 94(2448) 75.7% (69.9-80.8%) 96.2% (95.3-96.9%) 0.673
Capreomycin mykrobe 72(235) 87(2448) 69.4% (63.2-74.9%) 96.4% (95.6-97.1%) 0.64
Capreomycin tbprofiler 54(235) 95(2448) 77.0% (71.2-81.9%) 96.1% (95.3-96.8%) 0.681
Delamanid drprg 111(116) 5(8151) 4.3% (1.9-9.7%) 99.9% (99.9-100.0%) 0.144
Delamanid mykrobe 111(116) 2(8151) 4.3% (1.9-9.7%) 100.0% (99.9-100.0%) 0.173
Delamanid tbprofiler 111(116) 2(8151) 4.3% (1.9-9.7%) 100.0% (99.9-100.0%) 0.173
Ethambutol drprg 121(1537) 752(4935) 92.1% (90.7-93.4%) 84.8% (83.7-85.7%) 0.693
Ethambutol mykrobe 133(1537) 747(4935) 91.3% (89.8-92.7%) 84.9% (83.8-85.8%) 0.688
Ethambutol tbprofiler 118(1537) 765(4935) 92.3% (90.9-93.6%) 84.5% (83.5-85.5%) 0.691
Ethionamide drprg 272(1103) 420(6105) 75.3% (72.7-77.8%) 93.1% (92.5-93.7%) 0.651
Ethionamide mykrobe 265(1103) 413(6105) 76.0% (73.4-78.4%) 93.2% (92.6-93.8%) 0.658
Ethionamide tbprofiler 272(1103) 414(6105) 75.3% (72.7-77.8%) 93.2% (92.6-93.8%) 0.653
Isoniazid drprg 307(3899) 173(4193) 92.1% (91.2-92.9%) 95.9% (95.2-96.4%) 0.882
Isoniazid mykrobe 333(3899) 170(4193) 91.5% (90.5-92.3%) 95.9% (95.3-96.5%) 0.876
Isoniazid tbprofiler 297(3899) 181(4193) 92.4% (91.5-93.2%) 95.7% (95.0-96.3%) 0.882
Kanamycin drprg 128(669) 107(6975) 80.9% (77.7-83.7%) 98.5% (98.1-98.7%) 0.805
Kanamycin mykrobe 152(669) 98(6975) 77.3% (74.0-80.3%) 98.6% (98.3-98.8%) 0.788
Kanamycin tbprofiler 122(669) 107(6975) 81.8% (78.7-84.5%) 98.5% (98.1-98.7%) 0.811
Levofloxacin drprg 79(1040) 104(5454) 92.4% (90.6-93.9%) 98.1% (97.7-98.4%) 0.896
Levofloxacin mykrobe 88(1040) 102(5454) 91.5% (89.7-93.1%) 98.1% (97.7-98.5%) 0.892
Levofloxacin tbprofiler 85(1040) 109(5454) 91.8% (90.0-93.3%) 98.0% (97.6-98.3%) 0.89
Linezolid drprg 48(65) 4(6109) 26.2% (17.0-38.0%) 99.9% (99.8-100.0%) 0.457
Linezolid mykrobe 48(65) 4(6109) 26.2% (17.0-38.0%) 99.9% (99.8-100.0%) 0.457
Linezolid tbprofiler 48(65) 5(6109) 26.2% (17.0-38.0%) 99.9% (99.8-100.0%) 0.447
Moxifloxacin drprg 40(603) 478(5430) 93.4% (91.1-95.1%) 91.2% (90.4-91.9%) 0.671
Moxifloxacin mykrobe 44(603) 472(5430) 92.7% (90.3-94.5%) 91.3% (90.5-92.0%) 0.669
Moxifloxacin tbprofiler 42(603) 481(5430) 93.0% (90.7-94.8%) 91.1% (90.4-91.9%) 0.668
Ofloxacin drprg 24(104) 5(424) 76.9% (68.0-84.0%) 98.8% (97.3-99.5%) 0.82
Ofloxacin mykrobe 26(104) 5(424) 75.0% (65.9-82.3%) 98.8% (97.3-99.5%) 0.807
Ofloxacin tbprofiler 26(104) 6(424) 75.0% (65.9-82.3%) 98.6% (96.9-99.3%) 0.8
Pyrazinamide drprg 67(341) 54(820) 80.4% (75.8-84.2%) 93.4% (91.5-94.9%) 0.746
Pyrazinamide mykrobe 55(341) 56(820) 83.9% (79.6-87.4%) 93.2% (91.2-94.7%) 0.77
Pyrazinamide tbprofiler 45(341) 62(820) 86.8% (82.8-90.0%) 92.4% (90.4-94.1%) 0.782
Rifampicin drprg 114(3221) 168(4585) 96.5% (95.8-97.0%) 96.3% (95.8-96.8%) 0.926
Rifampicin mykrobe 164(3221) 169(4585) 94.9% (94.1-95.6%) 96.3% (95.7-96.8%) 0.912
Rifampicin tbprofiler 102(3221) 177(4585) 96.8% (96.2-97.4%) 96.1% (95.5-96.7%) 0.927
Streptomycin drprg 267(1041) 134(1205) 74.4% (71.6-76.9%) 88.9% (87.0-90.5%) 0.643
Streptomycin mykrobe 282(1041) 135(1205) 72.9% (70.1-75.5%) 88.8% (86.9-90.5%) 0.629
Streptomycin tbprofiler 257(1041) 136(1205) 75.3% (72.6-77.8%) 88.7% (86.8-90.4%) 0.649

illumina

iqbal-lab commented 1 year ago

OK, so looking at those results now, we can definitely see a sensitive improvement over Mykrobe with no precision loss. Compared with tbprofiler we are broadly the same - tbprofiler mostly has slightly better recall and slightly worse precision (except for fluoroquinolones). The biggest difference is 7% higher recall for tbprofiler for pyrazinamide . Fair summary?

mbhall88 commented 1 year ago

Yep, fair summary. The work in mbhall88/drprg#24 should improve the PZA recall slightly too.

mbhall88 commented 1 year ago

After the work in mbhall88/drprg#26 , we get the following Illumina results (nanopore is unchanged). Note: only ETO and PZA change from last results

Drug Tool FN(R) FP(S) Sensitivity (95% CI) Specificity (95% CI) MCC
Amikacin drprg 66(484) 57(6958) 86.4% (83.0-89.1%) 99.2% (98.9-99.4%) 0.863
Amikacin mykrobe 93(484) 51(6958) 80.8% (77.0-84.0%) 99.3% (99.0-99.4%) 0.835
Amikacin tbprofiler 62(484) 59(6958) 87.2% (83.9-89.9%) 99.2% (98.9-99.3%) 0.866
Capreomycin drprg 56(235) 94(2448) 76.2% (70.3-81.2%) 96.2% (95.3-96.9%) 0.676
Capreomycin mykrobe 72(235) 87(2448) 69.4% (63.2-74.9%) 96.4% (95.6-97.1%) 0.64
Capreomycin tbprofiler 54(235) 95(2448) 77.0% (71.2-81.9%) 96.1% (95.3-96.8%) 0.681
Delamanid drprg 111(116) 4(8151) 4.3% (1.9-9.7%) 100.0% (99.9-100.0%) 0.152
Delamanid mykrobe 111(116) 2(8151) 4.3% (1.9-9.7%) 100.0% (99.9-100.0%) 0.173
Delamanid tbprofiler 111(116) 2(8151) 4.3% (1.9-9.7%) 100.0% (99.9-100.0%) 0.173
Ethambutol drprg 120(1537) 754(4935) 92.2% (90.7-93.4%) 84.7% (83.7-85.7%) 0.693
Ethambutol mykrobe 133(1537) 747(4935) 91.3% (89.8-92.7%) 84.9% (83.8-85.8%) 0.688
Ethambutol tbprofiler 118(1537) 765(4935) 92.3% (90.9-93.6%) 84.5% (83.5-85.5%) 0.691
Ethionamide drprg 245(1103) 418(6105) 77.8% (75.2-80.1%) 93.2% (92.5-93.8%) 0.669
Ethionamide mykrobe 265(1103) 413(6105) 76.0% (73.4-78.4%) 93.2% (92.6-93.8%) 0.658
Ethionamide tbprofiler 272(1103) 414(6105) 75.3% (72.7-77.8%) 93.2% (92.6-93.8%) 0.653
Isoniazid drprg 305(3899) 173(4193) 92.2% (91.3-93.0%) 95.9% (95.2-96.4%) 0.882
Isoniazid mykrobe 333(3899) 170(4193) 91.5% (90.5-92.3%) 95.9% (95.3-96.5%) 0.876
Isoniazid tbprofiler 297(3899) 181(4193) 92.4% (91.5-93.2%) 95.7% (95.0-96.3%) 0.882
Kanamycin drprg 126(669) 107(6975) 81.2% (78.0-83.9%) 98.5% (98.1-98.7%) 0.807
Kanamycin mykrobe 152(669) 98(6975) 77.3% (74.0-80.3%) 98.6% (98.3-98.8%) 0.788
Kanamycin tbprofiler 122(669) 107(6975) 81.8% (78.7-84.5%) 98.5% (98.1-98.7%) 0.811
Levofloxacin drprg 80(1040) 106(5454) 92.3% (90.5-93.8%) 98.1% (97.7-98.4%) 0.895
Levofloxacin mykrobe 88(1040) 102(5454) 91.5% (89.7-93.1%) 98.1% (97.7-98.5%) 0.892
Levofloxacin tbprofiler 85(1040) 109(5454) 91.8% (90.0-93.3%) 98.0% (97.6-98.3%) 0.89
Linezolid drprg 48(65) 4(6109) 26.2% (17.0-38.0%) 99.9% (99.8-100.0%) 0.457
Linezolid mykrobe 48(65) 4(6109) 26.2% (17.0-38.0%) 99.9% (99.8-100.0%) 0.457
Linezolid tbprofiler 48(65) 5(6109) 26.2% (17.0-38.0%) 99.9% (99.8-100.0%) 0.447
Moxifloxacin drprg 39(603) 477(5430) 93.5% (91.3-95.2%) 91.2% (90.4-91.9%) 0.673
Moxifloxacin mykrobe 44(603) 472(5430) 92.7% (90.3-94.5%) 91.3% (90.5-92.0%) 0.669
Moxifloxacin tbprofiler 42(603) 481(5430) 93.0% (90.7-94.8%) 91.1% (90.4-91.9%) 0.668
Ofloxacin drprg 25(104) 5(424) 76.0% (66.9-83.2%) 98.8% (97.3-99.5%) 0.813
Ofloxacin mykrobe 26(104) 5(424) 75.0% (65.9-82.3%) 98.8% (97.3-99.5%) 0.807
Ofloxacin tbprofiler 26(104) 6(424) 75.0% (65.9-82.3%) 98.6% (96.9-99.3%) 0.8
Pyrazinamide drprg 57(341) 55(820) 83.3% (79.0-86.9%) 93.3% (91.4-94.8%) 0.767
Pyrazinamide mykrobe 55(341) 56(820) 83.9% (79.6-87.4%) 93.2% (91.2-94.7%) 0.77
Pyrazinamide tbprofiler 45(341) 62(820) 86.8% (82.8-90.0%) 92.4% (90.4-94.1%) 0.782
Rifampicin drprg 112(3221) 168(4585) 96.5% (95.8-97.1%) 96.3% (95.8-96.8%) 0.926
Rifampicin mykrobe 164(3221) 169(4585) 94.9% (94.1-95.6%) 96.3% (95.7-96.8%) 0.912
Rifampicin tbprofiler 102(3221) 177(4585) 96.8% (96.2-97.4%) 96.1% (95.5-96.7%) 0.927
Streptomycin drprg 259(1041) 135(1205) 75.1% (72.4-77.7%) 88.8% (86.9-90.5%) 0.648
Streptomycin mykrobe 282(1041) 135(1205) 72.9% (70.1-75.5%) 88.8% (86.9-90.5%) 0.629
Streptomycin tbprofiler 257(1041) 136(1205) 75.3% (72.6-77.8%) 88.7% (86.8-90.4%) 0.649

illumina

PZA still isn't great, but there are just so many different mutations with minor alleles that we don't have in the graph and hand-picking them all could lead to a complicated graph. Although I can try adding them if we really want to try boosting PZA sensitivity...

iqbal-lab commented 1 year ago

I think those results are much improved, am wondering what the pitch is for drprg though. Illumina is better than Mykrobe and ~same as tbprofiler. Are the nanopore results really unchanged from before ? Leandros mapping fixes will help too

mbhall88 commented 1 year ago

am wondering what the pitch is for drprg though

Yeah, this has been troubling me too...I mean we can notice gene deletions...We use a lot less resources....

Are the nanopore results really unchanged from before ?

Here are the current nanopore results

Drug Tool FN(R) FP(S) Sensitivity (95% CI) Specificity (95% CI) MCC
Amikacin drprg 0(11) 3(78) 100.0% (74.1-100.0%) 96.2% (89.3-98.7%) 0.869
Amikacin mykrobe 0(11) 3(78) 100.0% (74.1-100.0%) 96.2% (89.3-98.7%) 0.869
Amikacin tbprofiler 0(11) 3(78) 100.0% (74.1-100.0%) 96.2% (89.3-98.7%) 0.869
Capreomycin drprg 1(1) 1(51) 0.0% (0.0-79.3%) 98.0% (89.7-99.7%) -0.02
Capreomycin mykrobe 1(1) 1(51) 0.0% (0.0-79.3%) 98.0% (89.7-99.7%) -0.02
Capreomycin tbprofiler 1(1) 1(51) 0.0% (0.0-79.3%) 98.0% (89.7-99.7%) -0.02
Ethambutol drprg 4(14) 15(77) 71.4% (45.4-88.3%) 80.5% (70.3-87.8%) 0.42
Ethambutol mykrobe 4(14) 15(77) 71.4% (45.4-88.3%) 80.5% (70.3-87.8%) 0.42
Ethambutol tbprofiler 5(14) 15(77) 64.3% (38.8-83.7%) 80.5% (70.3-87.8%) 0.367
Ethionamide drprg 0(4) 1(9) 100.0% (51.0-100.0%) 88.9% (56.5-98.0%) 0.843
Ethionamide mykrobe 0(4) 1(9) 100.0% (51.0-100.0%) 88.9% (56.5-98.0%) 0.843
Ethionamide tbprofiler 0(4) 1(9) 100.0% (51.0-100.0%) 88.9% (56.5-98.0%) 0.843
Isoniazid drprg 9(51) 5(48) 82.4% (69.7-90.4%) 89.6% (77.8-95.5%) 0.72
Isoniazid mykrobe 9(51) 4(48) 82.4% (69.7-90.4%) 91.7% (80.4-96.7%) 0.742
Isoniazid tbprofiler 9(51) 3(48) 82.4% (69.7-90.4%) 93.8% (83.2-97.9%) 0.764
Kanamycin drprg 0(0) 1(52) - 98.1% (89.9-99.7%) -
Kanamycin mykrobe 0(0) 1(52) - 98.1% (89.9-99.7%) -
Kanamycin tbprofiler 0(0) 1(52) - 98.1% (89.9-99.7%) -
Moxifloxacin drprg 0(0) 1(1) - 0.0% (0.0-79.3%) -
Moxifloxacin mykrobe 0(0) 1(1) - 0.0% (0.0-79.3%) -
Moxifloxacin tbprofiler 0(0) 1(1) - 0.0% (0.0-79.3%) -
Ofloxacin drprg 0(10) 4(77) 100.0% (72.2-100.0%) 94.8% (87.4-98.0%) 0.823
Ofloxacin mykrobe 0(10) 4(77) 100.0% (72.2-100.0%) 94.8% (87.4-98.0%) 0.823
Ofloxacin tbprofiler 0(10) 3(77) 100.0% (72.2-100.0%) 96.1% (89.2-98.7%) 0.86
Pyrazinamide drprg 0(0) 0(1) - 100.0% (20.7-100.0%) -
Pyrazinamide mykrobe 0(0) 0(1) - 100.0% (20.7-100.0%) -
Pyrazinamide tbprofiler 0(0) 0(1) - 100.0% (20.7-100.0%) -
Rifampicin drprg 5(48) 1(44) 89.6% (77.8-95.5%) 97.7% (88.2-99.6%) 0.873
Rifampicin mykrobe 5(48) 1(44) 89.6% (77.8-95.5%) 97.7% (88.2-99.6%) 0.873
Rifampicin tbprofiler 5(48) 1(44) 89.6% (77.8-95.5%) 97.7% (88.2-99.6%) 0.873
Streptomycin drprg 2(8) 14(83) 75.0% (40.9-92.9%) 83.1% (73.7-89.7%) 0.398
Streptomycin mykrobe 2(8) 27(83) 75.0% (40.9-92.9%) 67.5% (56.8-76.6%) 0.25
Streptomycin tbprofiler 2(8) 12(83) 75.0% (40.9-92.9%) 85.5% (76.4-91.5%) 0.43

nanopore

Sample sizes are so small it makes it hard to get a clear picture for a lot of drugs.

mbhall88 commented 1 year ago

Here are the Illumina results on the full dataset (45,193 samples)

illumina

Drug Tool FN(R) FP(S) Sensitivity (95% CI) Specificity (95% CI) MCC
Amikacin drprg 270(1864) 225(18732) 85.5% (83.8-87.0%) 98.8% (98.6-98.9%) 0.852
Amikacin mykrobe 358(1864) 195(18732) 80.8% (78.9-82.5%) 99.0% (98.8-99.1%) 0.831
Amikacin tbprofiler 269(1864) 227(18732) 85.6% (83.9-87.1%) 98.8% (98.6-98.9%) 0.852
Capreomycin drprg 293(1298) 300(13034) 77.4% (75.1-79.6%) 97.7% (97.4-97.9%) 0.749
Capreomycin mykrobe 367(1298) 265(13034) 71.7% (69.2-74.1%) 98.0% (97.7-98.2%) 0.723
Capreomycin tbprofiler 292(1298) 305(13034) 77.5% (75.2-79.7%) 97.7% (97.4-97.9%) 0.748
Delamanid drprg 111(116) 4(8151) 4.3% (1.9-9.7%) 100.0% (99.9-100.0%) 0.152
Delamanid mykrobe 111(116) 2(8151) 4.3% (1.9-9.7%) 100.0% (99.9-100.0%) 0.173
Delamanid tbprofiler 111(116) 2(8151) 4.3% (1.9-9.7%) 100.0% (99.9-100.0%) 0.173
Ethambutol drprg 484(5706) 2287(26863) 91.5% (90.8-92.2%) 91.5% (91.1-91.8%) 0.749
Ethambutol mykrobe 499(5706) 2265(26863) 91.3% (90.5-92.0%) 91.6% (91.2-91.9%) 0.749
Ethambutol tbprofiler 471(5706) 2290(26863) 91.7% (91.0-92.4%) 91.5% (91.1-91.8%) 0.751
Ethionamide drprg 672(2853) 992(11016) 76.4% (74.9-78.0%) 91.0% (90.4-91.5%) 0.649
Ethionamide mykrobe 772(2853) 960(11016) 72.9% (71.3-74.5%) 91.3% (90.7-91.8%) 0.627
Ethionamide tbprofiler 787(2853) 964(11016) 72.4% (70.7-74.0%) 91.2% (90.7-91.8%) 0.623
Isoniazid drprg 1016(14531) 593(25764) 93.0% (92.6-93.4%) 97.7% (97.5-97.9%) 0.913
Isoniazid mykrobe 1054(14531) 560(25764) 92.7% (92.3-93.2%) 97.8% (97.6-98.0%) 0.913
Isoniazid tbprofiler 987(14531) 648(25764) 93.2% (92.8-93.6%) 97.5% (97.3-97.7%) 0.912
Kanamycin drprg 359(2205) 316(17934) 83.7% (82.1-85.2%) 98.2% (98.0-98.4%) 0.827
Kanamycin mykrobe 437(2205) 300(17934) 80.2% (78.5-81.8%) 98.3% (98.1-98.5%) 0.808
Kanamycin tbprofiler 349(2205) 322(17934) 84.2% (82.6-85.6%) 98.2% (98.0-98.4%) 0.828
Levofloxacin drprg 272(3102) 355(14867) 91.2% (90.2-92.2%) 97.6% (97.4-97.8%) 0.879
Levofloxacin mykrobe 299(3102) 330(14867) 90.4% (89.3-91.4%) 97.8% (97.5-98.0%) 0.878
Levofloxacin tbprofiler 276(3102) 356(14867) 91.1% (90.0-92.1%) 97.6% (97.3-97.8%) 0.878
Linezolid drprg 104(152) 30(10911) 31.6% (24.7-39.3%) 99.7% (99.6-99.8%) 0.436
Linezolid mykrobe 105(152) 29(10911) 30.9% (24.1-38.7%) 99.7% (99.6-99.8%) 0.432
Linezolid tbprofiler 104(152) 31(10911) 31.6% (24.7-39.3%) 99.7% (99.6-99.8%) 0.433
Moxifloxacin drprg 178(2255) 1133(14696) 92.1% (90.9-93.1%) 92.3% (91.8-92.7%) 0.732
Moxifloxacin mykrobe 207(2255) 1113(14696) 90.8% (89.6-91.9%) 92.4% (92.0-92.8%) 0.726
Moxifloxacin tbprofiler 182(2255) 1141(14696) 91.9% (90.7-93.0%) 92.2% (91.8-92.7%) 0.729
Ofloxacin drprg 166(778) 68(6007) 78.7% (75.6-81.4%) 98.9% (98.6-99.1%) 0.823
Ofloxacin mykrobe 147(778) 62(6007) 81.1% (78.2-83.7%) 99.0% (98.7-99.2%) 0.842
Ofloxacin tbprofiler 138(778) 65(6007) 82.3% (79.4-84.8%) 98.9% (98.6-99.2%) 0.848
Pyrazinamide drprg 786(3682) 500(17748) 78.7% (77.3-79.9%) 97.2% (96.9-97.4%) 0.783
Pyrazinamide mykrobe 776(3682) 444(17748) 78.9% (77.6-80.2%) 97.5% (97.3-97.7%) 0.794
Pyrazinamide tbprofiler 715(3682) 502(17748) 80.6% (79.3-81.8%) 97.2% (96.9-97.4%) 0.796
Rifampicin drprg 576(11766) 593(28292) 95.1% (94.7-95.5%) 97.9% (97.7-98.1%) 0.93
Rifampicin mykrobe 523(11766) 604(28292) 95.6% (95.2-95.9%) 97.9% (97.7-98.0%) 0.932
Rifampicin tbprofiler 370(11766) 788(28292) 96.9% (96.5-97.2%) 97.2% (97.0-97.4%) 0.931
Streptomycin drprg 784(5362) 760(10179) 85.4% (84.4-86.3%) 92.5% (92.0-93.0%) 0.78
Streptomycin mykrobe 903(5362) 677(10179) 83.2% (82.1-84.1%) 93.3% (92.8-93.8%) 0.773
Streptomycin tbprofiler 778(5362) 662(10179) 85.5% (84.5-86.4%) 93.5% (93.0-94.0%) 0.794

I am currently working through the INH FNs and have learned a lot and fixed some bugs. Most important result to understand here though will be the RIF sensitivity which is significantly lower than tb-profiler

mbhall88 commented 1 year ago

I think I might have gotten to the bottom of the RIF sensitivity issue (also impacts a decent amount of INH FNs).

tl;dr we need a smaller minimum cluster size for (some) Illumina reads in pandora.

Cluster size dictates whether we recognise a read as "hitting" a locus. The default is 10. But I was finding a lot of FNs where we just have these big random stretches of zero depth - generally in and around the RRDR. When I map these reads to H37Rv with minimap2 it was showing that we should definitely have depth over the RRDR and it's surrounding regions. Turns out most of them are unmapped in the pandora SAM file. In the end, most of these reads were getting ~4-6 hits, therefore they were being marked as unmapped because they're below the default of 10. I have also noticed a lot of the samples with this issue are Illumina HiSeq 2000 75bp reads. This relates back to https://github.com/mbhall88/drprg/issues/12#issuecomment-1244890728.

I've run on a few samples with the minimum cluster size set to 4 and it seems to have resolved the issue for those samples. So I'm going to rerun all samples and reasssess the results after than 🤞

iqbal-lab commented 1 year ago

Also relates to long reads that overlap a prg only at the end .