nanoporetech / pore-c

Pore-C support
Mozilla Public License 2.0
52 stars 5 forks source link

nanopolish call-methylation fails with many bad fast5s #45

Closed will-NYGC closed 3 years ago

will-NYGC commented 3 years ago

I am trying to call methylation on the guppy called fastq's and Pore-C Snakemake BAM outputs with nanopolish. I am using a test fastq with 50000 reads. All 50000 are indexed properly but when I use nanopolish call-methylation I get lots of "bad fast5" and no calls.

$ zcat test.fq.gz | awk 'END {print NR/4}'
50000

$ nanopolish index --verbose --directory fast5/ --sequencing-summary guppy/sequencing_summary.txt test.fq.gz
[readdb] indexing fast5
[readdb] num reads: 50000, num reads with path to fast5: 50000

$ nanopolish call-methylation --verbose --reads=test.fq.gz --bam=porec/mapping/test.bam --genome=ref.rg.fa --methylation=cpg --threads=8 --progress -K 1024
chromosome  strand  start   end read_name   log_lik_ratio   log_lik_methylated  log_lik_unmethylated    num_calling_strands num_motifs  sequence
[post-run summary] total reads: 141242, unparseable: 0, qc fail: 0, could not calibrate: 0, no alignment: 0, bad fast5: 140191

$ samtools flagstat porec/mapping/test.bam
235538 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
234369 + 0 mapped (99.50% : N/A)
0 + 0 paired in sequencing
0 + 0 read1
0 + 0 read2
0 + 0 properly paired (N/A : N/A)
0 + 0 with itself and mate mapped
0 + 0 singletons (N/A : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)

I don't quite understand why the call-methylation sees 141242 total reads and why 140191 have "bad fast5". When I look at the readdb index file, all fast5 locations are readable.

Does the porec aligned BAM split the concatemers into separate reads? Is that why the BAM contains 235538 reads even though the fastq only has 50000.

eharr commented 3 years ago

Sorry for the slow response. The BAM that pore-c creates appends information to the read name to make it work with WhatsHap. It also contains alignments that fail the pore-c filters and so probably shouldn't be used for your methylation analysis. I recently pushed a new version of the tools (0.4.0) with a tool to prep your BAM for methylation calling: https://nanoporetech.github.io/pore-c/cli.html#filter-bam

There's also a new version of the Pore-C-Snakemake pipeline that does the methylation calling for you.

will-NYGC commented 3 years ago

Thanks, I did manage to figure this out and formatted the BAMs to make it work. I see the new snakemake uses f5c to make the meth calls per batch. this is great! will use this new pipeline moving forward.

eharr commented 3 years ago

Glad to hear that - sorry again for the delay in responding.