Concordance of resistance predictions with phenotype

mbhall88 commented 3 years ago

We do not have phenotype data for every sample/drug, so just for those we do have.
Similar to #75, we will have a table showing Very Major Error (missed resistance), Major Error (missed susceptible), PPV (what % of R calls are R) and NPV (what % of S calls are S) for each tool.

mbhall88 commented 3 years ago

First draft of the figure for this analysis

Figure 2

Number of resistant (left) and susceptible (right) phenotypes correctly identified by mykrobe from Illumina (blue) and Nanopore (purple) data from the same samples. The red bars indicate missed (FN) or incorrect (FP) predictions. The x-axis shows the drugs with available phenotype data that mykrobe also makes predictions for. E - ethambutol; H - isoniazid; Z - pyrazinamide; R - rifampicin; S - streptomycin; Km - kanamycin; Am - amikacin; Ofx - ofloxacin; Cm - capreomycin; Mfx - moxifloxacin.

mbhall88 commented 3 years ago

We could conceivable also use a table (or replace the figure with a table if more informative)

drug	technology	NPV	PPV	sensitivity	specificity	ME/FP	VME/FN	TP	TN
Amikacin	Illumina	0.974025974025974	0.8181818181818182	0.8181818181818182	0.974026	2	2	9	75
Amikacin	Nanopore	1.0	0.8461538461538461	1.0	0.974026	2	0	11	75
Capreomycin	Illumina	1.0	0.0	-	0.980392	1	0	0	50
Capreomycin	Nanopore	1.0	0.0	-	0.980392	1	0	0	50
Ethambutol	Illumina	0.9375	0.38461538461538464	0.7142857142857143	0.789474	16	4	10	60
Ethambutol	Nanopore	0.9393939393939394	0.4166666666666667	0.7142857142857143	0.815789	14	4	10	62
Isoniazid	Illumina	0.8490566037735849	0.9333333333333333	0.84	0.9375	3	8	42	45
Isoniazid	Nanopore	0.8541666666666666	0.86	0.86	0.854167	7	7	43	41
Kanamycin	Illumina	1.0	0.0	-	0.980392	1	0	0	50
Kanamycin	Nanopore	1.0	0.0	-	0.980392	1	0	0	50
Moxifloxacin	Illumina	-	0.0	-	0	1	0	0	0
Moxifloxacin	Nanopore	-	0.0	-	0	1	0	0	0
Ofloxacin	Illumina	1.0	0.7142857142857143	1.0	0.947368	4	0	10	72
Ofloxacin	Nanopore	1.0	0.7142857142857143	1.0	0.947368	4	0	10	72
Pyrazinamide	Illumina	1.0	-	-	1	0	0	0	1
Pyrazinamide	Nanopore	1.0	-	-	1	0	0	0	1
Rifampicin	Illumina	0.8775510204081632	0.9761904761904762	0.8723404255319149	0.977273	1	6	41	43
Rifampicin	Nanopore	0.8775510204081632	0.9761904761904762	0.8723404255319149	0.977273	1	6	41	43
Streptomycin	Illumina	0.935064935064935	0.23076923076923078	0.375	0.878049	10	5	3	72
Streptomycin	Nanopore	0.9605263157894737	0.35714285714285715	0.625	0.890244	9	3	5	73

iqbal-lab commented 3 years ago

Cool! So essentially identical results except slightly better VME for amikacin, isoniazid and streptomycin, and slightly worse ME for isoniazid . I call that a win

iqbal-lab commented 3 years ago

Wait a minute, how come pyrazinamide is in the table, I thought we didn't have any phenotypes for that? Did I forget/get that wrong???

mbhall88 commented 3 years ago

Wait a minute, how come pyrazinamide is in the table, I thought we didn't have any phenotypes for that? Did I forget/get that wrong???

We have 1 sample with PZA DST haha. Maybe I just leave it out then?

mbhall88 commented 3 years ago

Here is another plot that is very insightful

Effect of Nanopore read depth on mykrobe phenotype prediction. Each point indicates the proportion (y-axis) of classifications of that type at the read depth (x-axis). Read depth is "binned". That is, read depth 40 is all samples with a read depth greater than 40 and less than or equal to 50. FP - false positive; TN - true negative; etc.

iqbal-lab commented 3 years ago

For this table https://github.com/mbhall88/head_to_head_pipeline/issues/76#issuecomment-875270092 we will need to have confidence intervals, eg see https://www.nature.com/articles/ncomms10063/tables/1

mbhall88 commented 3 years ago

I don't really understand where the confidence intervals come from? The values aren't the result of any kind of aggregation/averaging...

iqbal-lab commented 3 years ago

The confidence intervals inform you how much you can trust the rate (FPR, VME, whatever) based on the number of samples. A TPR of 90% is more confident if you find 9900 out of 10000 resistant samples than if you find 9 out of 10. This stuff (confidence intervals) always does my head in a bit though, so don't worry when you look up the definitions and they confuse the hell out of you.

mbhall88 commented 3 years ago

I see. Ok, I've added that in using the Wilson score interval - which is the same as was used in the recent mykrobe paper. However, I notice the Nature Comms paper used Clopper–Pearson confidence interval. Although I don't think the two are that different.

mbhall88 commented 3 years ago

I've also overlayed sample size for the coverage plot

iqbal-lab commented 3 years ago

Channeling my inner Michael patrolling appropriate slack channels... This issue is about concordance with phenotype, and this comment

https://github.com/mbhall88/head_to_head_pipeline/issues/76#issuecomment-877591309

Is about concordance with illumina, which is a different issue.

mbhall88 commented 3 years ago

Ah, right you are. Thank you!

mbhall88 commented 3 years ago

Something interesting to know about mykrobe - when using a diploid model for ONT it goes crazy and calls everything resistant to Isoniazid. Looking at a few samples, it looks like it calls a whole bunch of indels as HET

mbhall88 commented 3 years ago

I am going to start writing the results section with the following final (pending major issues) plot and table

Drug	Technology	FN(R)	FP(S)	FNR(95% CI)	FPR(95% CI)	PPV(95% CI)	NPV(95% CI)
Amikacin	Illumina	2(11)	2(77)	18.2% (5.1-47.7%)	2.6% (0.7-9.0%)	81.8% (52.3-94.9%)	97.4% (91.0-99.3%)
Amikacin	Nanopore	0(11)	2(77)	0.0% (0.0-25.9%)	2.6% (0.7-9.0%)	84.6% (57.8-95.7%)	100.0% (95.1-100.0%)
Capreomycin	Illumina	0(0)	1(51)	-	2.0% (0.3-10.3%)	0.0% (0.0-79.3%)	100.0% (92.9-100.0%)
Capreomycin	Nanopore	0(0)	1(51)	-	2.0% (0.3-10.3%)	0.0% (0.0-79.3%)	100.0% (92.9-100.0%)
Ethambutol	Illumina	3(14)	16(76)	21.4% (7.6-47.6%)	21.1% (13.4-31.5%)	40.7% (24.5-59.3%)	95.2% (86.9-98.4%)
Ethambutol	Nanopore	4(14)	14(76)	28.6% (11.7-54.6%)	18.4% (11.3-28.6%)	41.7% (24.5-61.2%)	93.9% (85.4-97.6%)
Isoniazid	Illumina	8(50)	3(48)	16.0% (8.3-28.5%)	6.2% (2.1-16.8%)	93.3% (82.1-97.7%)	84.9% (72.9-92.1%)
Isoniazid	Nanopore	7(50)	7(48)	14.0% (7.0-26.2%)	14.6% (7.2-27.2%)	86.0% (73.8-93.0%)	85.4% (72.8-92.8%)
Kanamycin	Illumina	0(0)	1(51)	-	2.0% (0.3-10.3%)	0.0% (0.0-79.3%)	100.0% (92.9-100.0%)
Kanamycin	Nanopore	0(0)	1(51)	-	2.0% (0.3-10.3%)	0.0% (0.0-79.3%)	100.0% (92.9-100.0%)
Ofloxacin	Illumina	0(10)	4(76)	0.0% (-0.0-27.8%)	5.3% (2.1-12.8%)	71.4% (45.4-88.3%)	100.0% (94.9-100.0%)
Ofloxacin	Nanopore	0(10)	4(76)	0.0% (-0.0-27.8%)	5.3% (2.1-12.8%)	71.4% (45.4-88.3%)	100.0% (94.9-100.0%)
Rifampicin	Illumina	5(47)	1(44)	10.6% (4.6-22.6%)	2.3% (0.4-11.8%)	97.7% (87.9-99.6%)	89.6% (77.8-95.5%)
Rifampicin	Nanopore	6(47)	1(44)	12.8% (6.0-25.2%)	2.3% (0.4-11.8%)	97.6% (87.7-99.6%)	87.8% (75.8-94.3%)
Streptomycin	Illumina	5(8)	10(82)	62.5% (30.6-86.3%)	12.2% (6.8-21.0%)	23.1% (8.2-50.3%)	93.5% (85.7-97.2%)
Streptomycin	Nanopore	3(8)	9(82)	37.5% (13.7-69.4%)	11.0% (5.9-19.6%)	35.7% (16.3-61.2%)	96.1% (89.0-98.6%)

mbhall88 / head_to_head_pipeline

Concordance of resistance predictions with phenotype #76

Figure 2