mbhall88 / head_to_head_pipeline

Snakemake pipelines to run the analysis for the Illumina vs. Nanopore comparison.
GNU General Public License v3.0
5 stars 2 forks source link

Concordance of resistance predictions with phenotype #76

Closed mbhall88 closed 3 years ago

mbhall88 commented 3 years ago

We do not have phenotype data for every sample/drug, so just for those we do have.
Similar to #75, we will have a table showing Very Major Error (missed resistance), Major Error (missed susceptible), PPV (what % of R calls are R) and NPV (what % of S calls are S) for each tool.

mbhall88 commented 3 years ago

First draft of the figure for this analysis


Figure 2

Number of resistant (left) and susceptible (right) phenotypes correctly identified by mykrobe from Illumina (blue) and Nanopore (purple) data from the same samples. The red bars indicate missed (FN) or incorrect (FP) predictions. The x-axis shows the drugs with available phenotype data that mykrobe also makes predictions for. E - ethambutol; H - isoniazid; Z - pyrazinamide; R - rifampicin; S - streptomycin; Km - kanamycin; Am - amikacin; Ofx - ofloxacin; Cm - capreomycin; Mfx - moxifloxacin.

image

mbhall88 commented 3 years ago

We could conceivable also use a table (or replace the figure with a table if more informative)

drug technology NPV PPV sensitivity specificity ME/FP VME/FN TP TN
Amikacin Illumina 0.974025974025974 0.8181818181818182 0.8181818181818182 0.974026 2 2 9 75
Amikacin Nanopore 1.0 0.8461538461538461 1.0 0.974026 2 0 11 75
Capreomycin Illumina 1.0 0.0 - 0.980392 1 0 0 50
Capreomycin Nanopore 1.0 0.0 - 0.980392 1 0 0 50
Ethambutol Illumina 0.9375 0.38461538461538464 0.7142857142857143 0.789474 16 4 10 60
Ethambutol Nanopore 0.9393939393939394 0.4166666666666667 0.7142857142857143 0.815789 14 4 10 62
Isoniazid Illumina 0.8490566037735849 0.9333333333333333 0.84 0.9375 3 8 42 45
Isoniazid Nanopore 0.8541666666666666 0.86 0.86 0.854167 7 7 43 41
Kanamycin Illumina 1.0 0.0 - 0.980392 1 0 0 50
Kanamycin Nanopore 1.0 0.0 - 0.980392 1 0 0 50
Moxifloxacin Illumina - 0.0 - 0 1 0 0 0
Moxifloxacin Nanopore - 0.0 - 0 1 0 0 0
Ofloxacin Illumina 1.0 0.7142857142857143 1.0 0.947368 4 0 10 72
Ofloxacin Nanopore 1.0 0.7142857142857143 1.0 0.947368 4 0 10 72
Pyrazinamide Illumina 1.0 - - 1 0 0 0 1
Pyrazinamide Nanopore 1.0 - - 1 0 0 0 1
Rifampicin Illumina 0.8775510204081632 0.9761904761904762 0.8723404255319149 0.977273 1 6 41 43
Rifampicin Nanopore 0.8775510204081632 0.9761904761904762 0.8723404255319149 0.977273 1 6 41 43
Streptomycin Illumina 0.935064935064935 0.23076923076923078 0.375 0.878049 10 5 3 72
Streptomycin Nanopore 0.9605263157894737 0.35714285714285715 0.625 0.890244 9 3 5 73
iqbal-lab commented 3 years ago

Cool! So essentially identical results except slightly better VME for amikacin, isoniazid and streptomycin, and slightly worse ME for isoniazid . I call that a win

iqbal-lab commented 3 years ago

Wait a minute, how come pyrazinamide is in the table, I thought we didn't have any phenotypes for that? Did I forget/get that wrong???

mbhall88 commented 3 years ago

Wait a minute, how come pyrazinamide is in the table, I thought we didn't have any phenotypes for that? Did I forget/get that wrong???

We have 1 sample with PZA DST haha. Maybe I just leave it out then?

mbhall88 commented 3 years ago

Here is another plot that is very insightful

Effect of Nanopore read depth on mykrobe phenotype prediction. Each point indicates the proportion (y-axis) of classifications of that type at the read depth (x-axis). Read depth is "binned". That is, read depth 40 is all samples with a read depth greater than 40 and less than or equal to 50. FP - false positive; TN - true negative; etc.

image

iqbal-lab commented 3 years ago

For this table https://github.com/mbhall88/head_to_head_pipeline/issues/76#issuecomment-875270092 we will need to have confidence intervals, eg see https://www.nature.com/articles/ncomms10063/tables/1

mbhall88 commented 3 years ago

I don't really understand where the confidence intervals come from? The values aren't the result of any kind of aggregation/averaging...

iqbal-lab commented 3 years ago

The confidence intervals inform you how much you can trust the rate (FPR, VME, whatever) based on the number of samples. A TPR of 90% is more confident if you find 9900 out of 10000 resistant samples than if you find 9 out of 10. This stuff (confidence intervals) always does my head in a bit though, so don't worry when you look up the definitions and they confuse the hell out of you.

mbhall88 commented 3 years ago

I see. Ok, I've added that in using the Wilson score interval - which is the same as was used in the recent mykrobe paper. However, I notice the Nature Comms paper used Clopper–Pearson confidence interval. Although I don't think the two are that different.

mbhall88 commented 3 years ago

I've also overlayed sample size for the coverage plot

image

iqbal-lab commented 3 years ago

Channeling my inner Michael patrolling appropriate slack channels... This issue is about concordance with phenotype, and this comment

https://github.com/mbhall88/head_to_head_pipeline/issues/76#issuecomment-877591309

Is about concordance with illumina, which is a different issue.

mbhall88 commented 3 years ago

Ah, right you are. Thank you!

mbhall88 commented 3 years ago

Something interesting to know about mykrobe - when using a diploid model for ONT it goes crazy and calls everything resistant to Isoniazid. Looking at a few samples, it looks like it calls a whole bunch of indels as HET

image

mbhall88 commented 3 years ago

I am going to start writing the results section with the following final (pending major issues) plot and table

image

Drug Technology FN(R) FP(S) FNR(95% CI) FPR(95% CI) PPV(95% CI) NPV(95% CI)
Amikacin Illumina 2(11) 2(77) 18.2% (5.1-47.7%) 2.6% (0.7-9.0%) 81.8% (52.3-94.9%) 97.4% (91.0-99.3%)
Amikacin Nanopore 0(11) 2(77) 0.0% (0.0-25.9%) 2.6% (0.7-9.0%) 84.6% (57.8-95.7%) 100.0% (95.1-100.0%)
Capreomycin Illumina 0(0) 1(51) - 2.0% (0.3-10.3%) 0.0% (0.0-79.3%) 100.0% (92.9-100.0%)
Capreomycin Nanopore 0(0) 1(51) - 2.0% (0.3-10.3%) 0.0% (0.0-79.3%) 100.0% (92.9-100.0%)
Ethambutol Illumina 3(14) 16(76) 21.4% (7.6-47.6%) 21.1% (13.4-31.5%) 40.7% (24.5-59.3%) 95.2% (86.9-98.4%)
Ethambutol Nanopore 4(14) 14(76) 28.6% (11.7-54.6%) 18.4% (11.3-28.6%) 41.7% (24.5-61.2%) 93.9% (85.4-97.6%)
Isoniazid Illumina 8(50) 3(48) 16.0% (8.3-28.5%) 6.2% (2.1-16.8%) 93.3% (82.1-97.7%) 84.9% (72.9-92.1%)
Isoniazid Nanopore 7(50) 7(48) 14.0% (7.0-26.2%) 14.6% (7.2-27.2%) 86.0% (73.8-93.0%) 85.4% (72.8-92.8%)
Kanamycin Illumina 0(0) 1(51) - 2.0% (0.3-10.3%) 0.0% (0.0-79.3%) 100.0% (92.9-100.0%)
Kanamycin Nanopore 0(0) 1(51) - 2.0% (0.3-10.3%) 0.0% (0.0-79.3%) 100.0% (92.9-100.0%)
Ofloxacin Illumina 0(10) 4(76) 0.0% (-0.0-27.8%) 5.3% (2.1-12.8%) 71.4% (45.4-88.3%) 100.0% (94.9-100.0%)
Ofloxacin Nanopore 0(10) 4(76) 0.0% (-0.0-27.8%) 5.3% (2.1-12.8%) 71.4% (45.4-88.3%) 100.0% (94.9-100.0%)
Rifampicin Illumina 5(47) 1(44) 10.6% (4.6-22.6%) 2.3% (0.4-11.8%) 97.7% (87.9-99.6%) 89.6% (77.8-95.5%)
Rifampicin Nanopore 6(47) 1(44) 12.8% (6.0-25.2%) 2.3% (0.4-11.8%) 97.6% (87.7-99.6%) 87.8% (75.8-94.3%)
Streptomycin Illumina 5(8) 10(82) 62.5% (30.6-86.3%) 12.2% (6.8-21.0%) 23.1% (8.2-50.3%) 93.5% (85.7-97.2%)
Streptomycin Nanopore 3(8) 9(82) 37.5% (13.7-69.4%) 11.0% (5.9-19.6%) 35.7% (16.3-61.2%) 96.1% (89.0-98.6%)

image