jts / nanopolish

Signal-level algorithms for MinION data
MIT License
557 stars 160 forks source link

Error: spliced alignments detected when loading read #708

Open wangzhennan14 opened 4 years ago

wangzhennan14 commented 4 years ago

Hi, When I used nanopolish call methylation, there was an error as follow: Error: spliced alignments detected when loading read cce11e44-9f88-45f8-8fa4-5fdf23314323 Please align the reads to the genome using a non-spliced aligner comand.sh: line 3: 8436 Segmentation fault (core dumped) nanopolish call-methylation -t 50 -r all_NanoReads.fastq -b output.sorted.bam -g ref.fasta > methylation_calls.tsv But I used minimap2 to align reads vs reference as guided(https://nanopolish.readthedocs.io/en/latest/quickstart_call_methylation.html): minimap2 -a -t 35 -x map-ont ref.fasta all_NanoReads.fastq | samtools sort -T tmp -o output.sorted.bam What was the matter? Please give me some advice to solve this problem, thank you very much! Best wishes, Wang

jts commented 4 years ago

Hi @wangzhennan14,

Sorry for the slow response, I was on holidays. Can you run grep the alignment record for that read (cce11e44-9f88-45f8-8fa4-5fdf23314323) and paste it here?

Thanks, Jared

wangzhennan14 commented 4 years ago

Hi @jts , The alignment record for the read(cce11e44-9f88-45f8-8fa4-5fdf23314323) was error.read.log Please check it and help me to solve this problem, thank you very much! wangzhennan

jts commented 4 years ago

Hi @wangzhennan14,

That link appears to be broken.

Jared

wangzhennan14 commented 4 years ago

Hi @jts , Sorry fo that broken link, I have reload the error reads as follow, error.read.log Please help me to solve this problem. There are many samples meeting this problem. Thank you very much! wangzhennan

jts commented 4 years ago

Hi,

This read is very long (>400kb), and some aligners may emit invalid CIGAR strings for such long reads. Can you let me know:

  1. which aligner and version you used
  2. which version of samtools you used to BAM file

Thanks, Jared

wangzhennan14 commented 4 years ago

Hi @jts ,

  1. I used minimap2 to align Nanopore reads and the version of minimap2 is 2.1. 2.The version of samtools is 1.9. Thanks, Wang
jts commented 4 years ago

Thanks. Can you send me the BAM record for this read? The SAM file looks ok but the CIGAR problem only happens in BAM.

Jared

mike2vandy commented 3 years ago

Hello, was this ever solved? I've been running into the same problem lately. Does it seem to be related to a specific version of samtools or minimap2? Running samtools 1.9 and minimap 2.17-r974-dirty.

jts commented 3 years ago

I don't think I ever received a bam file that allowed me to reproduce the problem, so haven't fixed it yet.

mike2vandy commented 3 years ago

What's the easiest way I can send you mine? FYI, this is coming from FAF09701 of the human genome data.

jts commented 3 years ago

Can you send me the read ID? I have that data here

mike2vandy commented 3 years ago

8982dd2b-fddd-4c21-99b8-906d4bf06afb

jts commented 3 years ago

This read works for me, when I download it using wget http://s3.amazonaws.com/nanopore-human-wgs/rel4-nanopore-wgs-4249180049-FAF09701.fastq.gz. I'm using:

samtools 1.10
Using htslib 1.10.2
Copyright (C) 2019 Genome Research Ltd.
minimap2 2.17-r941

The basecalls in the fastq above are quite old, I haven't tried with newer basecalls.

mike2vandy commented 3 years ago

Hmm...let me update samtools and let you know what happens.

mike2vandy commented 3 years ago

Updated to samtools 1.12 and minimap2 2.18-r1015 Still the same error: Error: spliced alignments detected when loading read 8982dd2b-fddd-4c21-99b8-906d4bf06afb Please align the reads to the genome using a non-spliced aligner

Thoughts?

jts commented 3 years ago

Could you provide the fastq and bam file containing this single read? You should be able to attach it here, or email to jared.simpson@oicr.on.ca.

giesselmann commented 3 years ago

Hi @jts, I'm facing the same error as above using NGMLR v0.2.7 (-x ont --bam-fix) and nanopolish (v0.13.2 and latest build from master) It happened the first time and is very rare, samples were sequenced using the SQK-ULK001 kit. I extracted the read and made a package of fast5, fastq and bam to reproduce the error. The steps to reproduce are (reference is mm10): tar -xf spliced_read.tar.gz nanopolish index -d ./ spliced_read.fastq nanopolish call-methylation -r spliced_read.fastq -g mm10.fa -b spliced_read.bam The output is:

chromosome strand start end read_name log_lik_ratio log_lik_methylated log_lik_unmethylated num_calling_strands num_motifs sequence Error: spliced alignments detected when loading read 7a521bd6-133e-4d07-9bae-0cf81dd031ee Please align the reads to the genome using a non-spliced aligner

The read cigar does not seem to have N's though.

samtools view spliced_read.bam | awk '($6 ~ /N/)'

Thank you for the great software and help,

Pay

spliced_read.tar.gz

jts commented 3 years ago

Thanks, I'm on vacation now but will try with your data when I return to work. Could you try the methylation_bam branch, which uses a more recent htslib, to rule out an htslib issue?

giesselmann commented 3 years ago

Awesome, that fixed it!

jts commented 3 years ago

I'm glad to hear that, thanks for reporting back so quickly. I'll try to merge in the htslib upgrade soon.