nanopolish variants calling in amplicons

jts / nanopolish

Signal-level algorithms for MinION data

MIT License

559 stars 159 forks source link

nanopolish variants calling in amplicons #410

Closed epospiech closed 6 years ago

epospiech commented 6 years ago

Hi All,

I would like to use 'nanopolish variants' to call SNPs in MinION data. We have sequenced one long amplicon target obtained through PCR amplification. If I understand correctly I should skip "Compute the draft genome assembly using canu" here because there will be no assembly (only one amplicon, one type of reads) and I should proceed to variant calling using reference genome.

I would like to understand more the concept - my question is if in such case during variant calling there is polishing of the data? I mean if during variant calling the algorithm indeed is coming back to raw f5 files and improves variant calling when comparing to other variant callers like GATK? Or, if I don't do the draft genome assembly, then variant calling is standard as in other callers and there should be no improvement?

Comments appreciated! Regards, epospiech

jts commented 6 years ago

Hi @epospiech,

I'm very sorry for the slow response. If you have a reference genome you should skip the assembly with canu step. Nanopolish will use the same signal-level algorithms as when it is polishing a draft assembly, but the parameters will be slightly modified since a high quality reference is available. To run in reference-based mode, you simply omit the --consensus flag:

nanopolish variants -r reference_genome -b alignments.sorted.bam -r basecalled.fastq -w "chromsome:start-end" -t 8 --ploidy 1

Jared

epospiech commented 6 years ago

Thank you Jared!

Going back to the previous step, that is nanopolish indexing - when we provide a pathway to fast5 files - do fast5 files need to demultiplexed? Or the software is able to read barcodes and use only such fast5 files that correspond to the bam and fastq file of a sample that I analyze at the moment?

jts commented 6 years ago

You just need to provide a demultiplexed fastq file to nanopolish. It will only look at the fast5s that have a read in the basecalled/demultiplexed fastq.

epospiech commented 6 years ago

So I have demultiplexed fastq files - I prepared a bam file for a particular sample but during nanopolish indexing I provide directory to the fast5 files that are not demultiplexed and it can handle it (it can discriminate fast5 files from different barcodes) - am I correct?

jts commented 6 years ago

that's right