jts / nanopolish

Signal-level algorithms for MinION data
MIT License
550 stars 160 forks source link

Differences between VCF and consensus sequence #257

Closed isardi closed 6 years ago

isardi commented 6 years ago

Hi, I am testing how accurately I can correct reads from nanopore Minion by sequencing a plasmid.
I generated a VCF file by following all the steps given in this site. For the nanopolish variant I run the following code:

~/nanopolish/nanopolish variants -r fastq_runid_d22b8e7fd196418c6c106e08c22cdb0696c0264e_0.fastq -g ~/pGUS1.fa -b reads.sorted.bam --ploidy=1 -o pGUS1.vcf --min-candidate-frequency=0.65

This gave me a VCF with 2 variants (great!)

Then I wanted to generate a consensus fasta file. I run the following code:

~/nanopolish/nanopolish variants -r fastq_runid_d22b8e7fd196418c6c106e08c22cdb0696c0264e_0.fastq -g ~/pGUS1.fa -b reads.sorted.bam --ploidy=1 --consensus polished.pGUS1.fa --min-candidate-frequency=0.65

This generated a consensus fa file that has 39 mismatches compared to the original reference. I was expecting to get a consensus with the two variants reported in the VCF file. Why the difference? How can I correct it?

Thanks in advanced for your help.

-MariaI

jts commented 6 years ago

Hi Marial,

In consensus mode, nanopolish will test all possible single base edits to the input genome. In variant calling mode, it will only test variants that have a high level of support in the basecalled reads. In the former case it is trying many more candidate variants, so more of them end up being called as "improvements" and are written to the VCF file.

Jared