mbhall88 / head_to_head_pipeline

Snakemake pipelines to run the analysis for the Illumina vs. Nanopore comparison.
GNU General Public License v3.0
5 stars 2 forks source link

De novo SNP discovery in resistance genes #77

Closed mbhall88 closed 2 years ago

mbhall88 commented 3 years ago

The NEJM paper established that by detecting unknown off-catalogue mutations in resistance genes, and refusing to predict, they were able to get NPV for first-line drugs which was clinically acceptable. This was for illumina. To do this with nanopore, we would like to know sensitivity/recall for novel SNPs in R-genes. Count how many there are in our data, and what % are spotted by all the nanopore tools.

@iqbal-lab do we also want to look at indels?

iqbal-lab commented 3 years ago

Why not quickly count how many true indels we have in our R genes in our dataset (via cortex/clockwork). I predict the number will be so small we won't be powered to measure a decent recall (huge error bars) for true indels. Need huge datasets as indel prevalence is low.

Second suggestion, look at Pandora calls (with/out de novo) in those genes, and see how many FP indels we have. We are powered to check if we have an FP problem. I'm guessing we do, and that even excluding complex calls , that issue will remain. I don't have a sense whether this will be driven by some hidden Pandora issue or systematic error in guppy basecalling. But we need to see how bad an issue it is.

I think this will tell us whether we will A) be able to claim we can spot frame shifts reliably Or B) just briefly say no, indel calling is a limitation of nanopore and right now is a limitation compared with illumina, we were not able to get good indel de novo calls.

In essence, check if we have to drop indels.

mbhall88 commented 3 years ago

The implementation for this is "mostly" finalised in drprg commit https://github.com/mbhall88/drprg/commit/05fe0baaa499f5c39189c6442036f16d84dffef3

mbhall88 commented 2 years ago

Closing as this is no longer within the scope of this paper. Additionally, see Chapter 4 in my thesis for this analysis.