jts / nanopolish

Signal-level algorithms for MinION data
MIT License
550 stars 160 forks source link

Nanopolish Variants Using Low Proportion Of Reads #1117

Open seanmsjkim opened 7 months ago

seanmsjkim commented 7 months ago

Hi,

I'm trying to use nanopolish to generate consensus sequences from bam files. I am finding that 'nanopolish variants' is using very few of the reads located by 'nanopolish index'. An example of the series of commands I am running is:

nanopolish index [fastq.gz] --sequencing-summary [sequencing_summary.txt] --directory [fast5_pass];

nanopolish variants -v --min-flanking-sequence 10 -x 1000000 --progress -t 4 --reads [fastq.gz] -o [vcf] -b [bam] --ploidy 2 -m 0.15 -g [ref_genome];

nanopolish vcf2fasta --skip-checks -g [ref_genome] [vcf] > [consensus_fasta];

From running these, I might output as follows:

[readdb] num reads: 18619, num reads with path to fast5: 18619 [post-run summary] total reads: 5, unparseable: 0, qc fail: 0, could not calibrate: 0, no alignment: 1, bad fast5: 0 [vcf2fasta] rewrote contig NR_115606.1 with 0 subs, 0 ins, 0 dels (0 skipped)

I'm sure nanopolish does some level of read QC/discarding, but I'm not sure what parameters are causing my reads to fail and be discarded. If I'm understanding correctly, only some 0.023% of my reads are acceptable in this case? Most samples are not this severe, but I'd love to know why this is happening and how I might tweak or play with the thresholds if possible!

Some properties of my data: -v9.4 flowcells -1600bp amplicon/read length -Bacterial 16S

TIA, Sean