Closed mbhall88 closed 3 years ago
First draft of the figure for this analysis
Number of resistant (left) and susceptible (right) phenotypes correctly identified by mykrobe from Illumina (blue) and Nanopore (purple) data from the same samples. The red bars indicate missed (FN) or incorrect (FP) predictions. The x-axis shows the drugs with available phenotype data that mykrobe also makes predictions for. E - ethambutol; H - isoniazid; Z - pyrazinamide; R - rifampicin; S - streptomycin; Km - kanamycin; Am - amikacin; Ofx - ofloxacin; Cm - capreomycin; Mfx - moxifloxacin.
We could conceivable also use a table (or replace the figure with a table if more informative)
drug | technology | NPV | PPV | sensitivity | specificity | ME/FP | VME/FN | TP | TN |
---|---|---|---|---|---|---|---|---|---|
Amikacin | Illumina | 0.974025974025974 | 0.8181818181818182 | 0.8181818181818182 | 0.974026 | 2 | 2 | 9 | 75 |
Amikacin | Nanopore | 1.0 | 0.8461538461538461 | 1.0 | 0.974026 | 2 | 0 | 11 | 75 |
Capreomycin | Illumina | 1.0 | 0.0 | - | 0.980392 | 1 | 0 | 0 | 50 |
Capreomycin | Nanopore | 1.0 | 0.0 | - | 0.980392 | 1 | 0 | 0 | 50 |
Ethambutol | Illumina | 0.9375 | 0.38461538461538464 | 0.7142857142857143 | 0.789474 | 16 | 4 | 10 | 60 |
Ethambutol | Nanopore | 0.9393939393939394 | 0.4166666666666667 | 0.7142857142857143 | 0.815789 | 14 | 4 | 10 | 62 |
Isoniazid | Illumina | 0.8490566037735849 | 0.9333333333333333 | 0.84 | 0.9375 | 3 | 8 | 42 | 45 |
Isoniazid | Nanopore | 0.8541666666666666 | 0.86 | 0.86 | 0.854167 | 7 | 7 | 43 | 41 |
Kanamycin | Illumina | 1.0 | 0.0 | - | 0.980392 | 1 | 0 | 0 | 50 |
Kanamycin | Nanopore | 1.0 | 0.0 | - | 0.980392 | 1 | 0 | 0 | 50 |
Moxifloxacin | Illumina | - | 0.0 | - | 0 | 1 | 0 | 0 | 0 |
Moxifloxacin | Nanopore | - | 0.0 | - | 0 | 1 | 0 | 0 | 0 |
Ofloxacin | Illumina | 1.0 | 0.7142857142857143 | 1.0 | 0.947368 | 4 | 0 | 10 | 72 |
Ofloxacin | Nanopore | 1.0 | 0.7142857142857143 | 1.0 | 0.947368 | 4 | 0 | 10 | 72 |
Pyrazinamide | Illumina | 1.0 | - | - | 1 | 0 | 0 | 0 | 1 |
Pyrazinamide | Nanopore | 1.0 | - | - | 1 | 0 | 0 | 0 | 1 |
Rifampicin | Illumina | 0.8775510204081632 | 0.9761904761904762 | 0.8723404255319149 | 0.977273 | 1 | 6 | 41 | 43 |
Rifampicin | Nanopore | 0.8775510204081632 | 0.9761904761904762 | 0.8723404255319149 | 0.977273 | 1 | 6 | 41 | 43 |
Streptomycin | Illumina | 0.935064935064935 | 0.23076923076923078 | 0.375 | 0.878049 | 10 | 5 | 3 | 72 |
Streptomycin | Nanopore | 0.9605263157894737 | 0.35714285714285715 | 0.625 | 0.890244 | 9 | 3 | 5 | 73 |
Cool! So essentially identical results except slightly better VME for amikacin, isoniazid and streptomycin, and slightly worse ME for isoniazid . I call that a win
Wait a minute, how come pyrazinamide is in the table, I thought we didn't have any phenotypes for that? Did I forget/get that wrong???
Wait a minute, how come pyrazinamide is in the table, I thought we didn't have any phenotypes for that? Did I forget/get that wrong???
We have 1 sample with PZA DST haha. Maybe I just leave it out then?
Here is another plot that is very insightful
Effect of Nanopore read depth on mykrobe phenotype prediction. Each point indicates the proportion (y-axis) of classifications of that type at the read depth (x-axis). Read depth is "binned". That is, read depth 40 is all samples with a read depth greater than 40 and less than or equal to 50. FP - false positive; TN - true negative; etc.
For this table https://github.com/mbhall88/head_to_head_pipeline/issues/76#issuecomment-875270092 we will need to have confidence intervals, eg see https://www.nature.com/articles/ncomms10063/tables/1
I don't really understand where the confidence intervals come from? The values aren't the result of any kind of aggregation/averaging...
The confidence intervals inform you how much you can trust the rate (FPR, VME, whatever) based on the number of samples. A TPR of 90% is more confident if you find 9900 out of 10000 resistant samples than if you find 9 out of 10. This stuff (confidence intervals) always does my head in a bit though, so don't worry when you look up the definitions and they confuse the hell out of you.
I see. Ok, I've added that in using the Wilson score interval - which is the same as was used in the recent mykrobe paper. However, I notice the Nature Comms paper used Clopper–Pearson confidence interval. Although I don't think the two are that different.
I've also overlayed sample size for the coverage plot
Channeling my inner Michael patrolling appropriate slack channels... This issue is about concordance with phenotype, and this comment
https://github.com/mbhall88/head_to_head_pipeline/issues/76#issuecomment-877591309
Is about concordance with illumina, which is a different issue.
Ah, right you are. Thank you!
Something interesting to know about mykrobe - when using a diploid model for ONT it goes crazy and calls everything resistant to Isoniazid. Looking at a few samples, it looks like it calls a whole bunch of indels as HET
I am going to start writing the results section with the following final (pending major issues) plot and table
Drug | Technology | FN(R) | FP(S) | FNR(95% CI) | FPR(95% CI) | PPV(95% CI) | NPV(95% CI) |
---|---|---|---|---|---|---|---|
Amikacin | Illumina | 2(11) | 2(77) | 18.2% (5.1-47.7%) | 2.6% (0.7-9.0%) | 81.8% (52.3-94.9%) | 97.4% (91.0-99.3%) |
Amikacin | Nanopore | 0(11) | 2(77) | 0.0% (0.0-25.9%) | 2.6% (0.7-9.0%) | 84.6% (57.8-95.7%) | 100.0% (95.1-100.0%) |
Capreomycin | Illumina | 0(0) | 1(51) | - | 2.0% (0.3-10.3%) | 0.0% (0.0-79.3%) | 100.0% (92.9-100.0%) |
Capreomycin | Nanopore | 0(0) | 1(51) | - | 2.0% (0.3-10.3%) | 0.0% (0.0-79.3%) | 100.0% (92.9-100.0%) |
Ethambutol | Illumina | 3(14) | 16(76) | 21.4% (7.6-47.6%) | 21.1% (13.4-31.5%) | 40.7% (24.5-59.3%) | 95.2% (86.9-98.4%) |
Ethambutol | Nanopore | 4(14) | 14(76) | 28.6% (11.7-54.6%) | 18.4% (11.3-28.6%) | 41.7% (24.5-61.2%) | 93.9% (85.4-97.6%) |
Isoniazid | Illumina | 8(50) | 3(48) | 16.0% (8.3-28.5%) | 6.2% (2.1-16.8%) | 93.3% (82.1-97.7%) | 84.9% (72.9-92.1%) |
Isoniazid | Nanopore | 7(50) | 7(48) | 14.0% (7.0-26.2%) | 14.6% (7.2-27.2%) | 86.0% (73.8-93.0%) | 85.4% (72.8-92.8%) |
Kanamycin | Illumina | 0(0) | 1(51) | - | 2.0% (0.3-10.3%) | 0.0% (0.0-79.3%) | 100.0% (92.9-100.0%) |
Kanamycin | Nanopore | 0(0) | 1(51) | - | 2.0% (0.3-10.3%) | 0.0% (0.0-79.3%) | 100.0% (92.9-100.0%) |
Ofloxacin | Illumina | 0(10) | 4(76) | 0.0% (-0.0-27.8%) | 5.3% (2.1-12.8%) | 71.4% (45.4-88.3%) | 100.0% (94.9-100.0%) |
Ofloxacin | Nanopore | 0(10) | 4(76) | 0.0% (-0.0-27.8%) | 5.3% (2.1-12.8%) | 71.4% (45.4-88.3%) | 100.0% (94.9-100.0%) |
Rifampicin | Illumina | 5(47) | 1(44) | 10.6% (4.6-22.6%) | 2.3% (0.4-11.8%) | 97.7% (87.9-99.6%) | 89.6% (77.8-95.5%) |
Rifampicin | Nanopore | 6(47) | 1(44) | 12.8% (6.0-25.2%) | 2.3% (0.4-11.8%) | 97.6% (87.7-99.6%) | 87.8% (75.8-94.3%) |
Streptomycin | Illumina | 5(8) | 10(82) | 62.5% (30.6-86.3%) | 12.2% (6.8-21.0%) | 23.1% (8.2-50.3%) | 93.5% (85.7-97.2%) |
Streptomycin | Nanopore | 3(8) | 9(82) | 37.5% (13.7-69.4%) | 11.0% (5.9-19.6%) | 35.7% (16.3-61.2%) | 96.1% (89.0-98.6%) |
We do not have phenotype data for every sample/drug, so just for those we do have.
Similar to #75, we will have a table showing Very Major Error (missed resistance), Major Error (missed susceptible), PPV (what % of R calls are R) and NPV (what % of S calls are S) for each tool.