jts / nanopolish

Signal-level algorithms for MinION data
MIT License
559 stars 159 forks source link

Error in nanopolish variants #403

Closed egeulgen closed 6 years ago

egeulgen commented 6 years ago

Hello, I'm trying to call variants using nanopolish. I created a fasta from fastq and indexed it with nanopolish: zcat fastq_files/filtered.fastq.gz | paste - - - - | cut -f 1,2 | sed 's/^@/>/' | tr "\t" "\n" > reads.fasta nanopolish index -d ./fast5/ reads.fasta

With the command: nanopolish variants --window chr7:117479963-117668665 --reads reads.fasta --bam aligned/aln.bam --genome Homo_sapiens_assembly38.fasta --ploidy 2 -t 1 -o variants.vcf

I get the error: nanopolish: src/nanopolish_squiggle_read.cpp:321: void SquiggleRead::load_from_raw(hid_t, uint32_t): Assertion 'rt.n > 0' failed. Aborted (core dumped)`

Is this because the indexing is wrong for some reason?

jts commented 6 years ago

Yes, something probably went wrong with the index step. Converting to fasta is unnecessary, nanopolish supports fastq. What output messages did index print?

Jared

egeulgen commented 6 years ago

Output of index is below:

[readdb] indexing ./fast5 [readdb] num reads: 3871, num reads with path to fast5: 3871

So omitting the --reads should be fine?

jts commented 6 years ago

That output looks ok.

You still need the --reads option. I just meant that you didn't need to convert fastq to fasta, you could have ran nanopolish index -d ./fast5 filtered_reads.fastq.

Jared

egeulgen commented 6 years ago
cp fastq_files/filtered.fastq.gz final.fastq.gz
gunzip final.fastq.gz
nanopolish index -d ./fast5 final.fastq

nanopolish variants --ploidy 2 -o variants.vcf --verbose --window "chr7:117479963-117668665" --reads final.fastq --bam aligned/aln.bam --genome Homo_sapiens_assembly38.fasta

Still getting the same errror:

nanopolish: src/nanopolish_squiggle_read.cpp:321: void SquiggleRead::load_from_raw(hid_t, uint32_t): Assertion `rt.n > 0' failed.
Aborted (core dumped)
jts commented 6 years ago

Are you able to run this through gdb to figure out the name of the read that causes the error?

egeulgen commented 6 years ago

I'm not familiar with running gdb. Do you think this is because of an error in a fast5 file?

jts commented 6 years ago

Yes, that is what i suspect. Did you do any processing on the fast5s or do they come straight from MinKNOW?

Maybe try these commands:

gdb --args nanopolish variants --ploidy 2 -o variants.vcf --verbose --window "chr7:117479963-117668665" --reads final.fastq --bam aligned/aln.bam --genome Homo_sapiens_assembly38.fasta

After gdb starts, type "r" to run the program. When it hits the assertion type print *this and send me the full output.

egeulgen commented 6 years ago

I get a different error:

Starting program: /home/ubuntu/bin/nanopolish variants --ploidy 2 -o variants.vcf --verbose --window chr7:117479963-117668665 --reads final.fastq --bam aligned/aln.bam --genome Homo_sapiens_assembly38.fasta
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
nanopolish: src/nanopolish_squiggle_read.cpp:321: void SquiggleRead::load_from_raw(hid_t, uint32_t): Assertion `rt.n > 0' failed.

Program received signal SIGABRT, Aborted.
0x00007ffff69adc37 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
56  ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) print *this
No symbol "this" in current context.
jts commented 6 years ago

can you type bt and send the output?

egeulgen commented 6 years ago
(gdb) bt
#0  0x00007ffff69adc37 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1  0x00007ffff69b1028 in __GI_abort () at abort.c:89
#2  0x00007ffff69a6bf6 in __assert_fail_base (fmt=0x7ffff6afb058 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", 
    assertion=assertion@entry=0x73ff79 "rt.n > 0", file=file@entry=0x740348 "src/nanopolish_squiggle_read.cpp", 
    line=line@entry=321, 
    function=function@entry=0x741220 <SquiggleRead::load_from_raw(int, unsigned int)::__PRETTY_FUNCTION__> "void SquiggleRead::load_from_raw(hid_t, uint32_t)") at assert.c:92
#3  0x00007ffff69a6ca2 in __GI___assert_fail (assertion=0x73ff79 "rt.n > 0", 
    file=0x740348 "src/nanopolish_squiggle_read.cpp", line=321, 
    function=0x741220 <SquiggleRead::load_from_raw(int, unsigned int)::__PRETTY_FUNCTION__> "void SquiggleRead::load_from_raw(hid_t, uint32_t)") at assert.c:101
#4  0x000000000046e4c6 in SquiggleRead::load_from_raw (this=this@entry=0x107f410, 
    hdf5_file=hdf5_file@entry=16777216, flags=flags@entry=0) at src/nanopolish_squiggle_read.cpp:321
#5  0x0000000000474441 in SquiggleRead::SquiggleRead (this=0x107f410, name=..., read_db=..., flags=0)
    at src/nanopolish_squiggle_read.cpp:118
#6  0x00000000004b594f in AlignmentDB::_load_squiggle_read (this=this@entry=0x7fffffffddf0, 
    read_name="f972d2ab-2e7d-437d-8a19-e6c7870f3dae") at src/alignment/nanopolish_alignment_db.cpp:631
#7  0x00000000004b5b8a in AlignmentDB::_load_events_by_region_from_read ()
    at src/alignment/nanopolish_alignment_db.cpp:569
#8  0x00000000004b5dd7 in AlignmentDB::_load_events_by_region_from_read (this=this@entry=0x7fffffffddf0, 
    sequence_records=std::vector of length 1315, capacity 2048 = {...})
    at src/alignment/nanopolish_alignment_db.cpp:564
#9  0x00000000004b68d6 in AlignmentDB::load_region (this=this@entry=0x7fffffffddf0, contig="chr7", 
    start_position=start_position@entry=117479923, stop_position=stop_position@entry=117668705)
    at src/alignment/nanopolish_alignment_db.cpp:405
#10 0x000000000041686b in call_variants_for_region (contig="chr7", region_start=117479963, region_end=117668665, 
    out_fp=out_fp@entry=0xcfcba0) at src/nanopolish_call_variants.cpp:882
#11 0x0000000000418664 in call_variants_main (argc=<optimized out>, argv=<optimized out>)
    at src/nanopolish_call_variants.cpp:1192
#12 0x0000000000408f1a in operator() (__args#1=0x7fffffffe3f0, __args#0=14, this=0xc98158)
    at /usr/include/c++/4.8/functional:2471
#13 main (argc=15, argv=0x7fffffffe3e8) at src/main/nanopolish.cpp:77
jts commented 6 years ago

ok, now type "f 4" then "print *this"

egeulgen commented 6 years ago
(gdb) f 4
#4  0x000000000046e4c6 in SquiggleRead::load_from_raw (this=this@entry=0x107f410, 
    hdf5_file=hdf5_file@entry=16777216, flags=flags@entry=0) at src/nanopolish_squiggle_read.cpp:321
321         assert(rt.n > 0);
(gdb) print *this
$1 = {read_name = "f972d2ab-2e7d-437d-8a19-e6c7870f3dae", read_type = SRT_TEMPLATE, nucleotide_type = SRNT_DNA, 
  pore_type = PT_R9, 
  fast5_path = "./fast5/GXB01121_20171220__GA10000_mux_scan_20_12_17_cftr_30318_read_126_ch_476_strand.fast5", 
  read_id = 2925395968, 
  read_sequence = "CCACAATGGAAATTCAAAGAAATCTTGTTGCAATAAACCAACAAATTTGTTAATTACTAAACTCTAAAGATATCAAAAATTGATGTAACCTATAGAATGCAGCATTTTATTCATTGAAAATTTTTTACTTAATGCTTAGCTAAAGTTAATGGAGTTCGTACCTGTTGTTAAAATGGAAATGAAGGTAACAACCAATGAAG"..., events = {
    std::vector of length 0, capacity 0, std::vector of length 0, capacity 0}, scalings = {{scale = 1, shift = 0, 
      drift = 0, var = 1, scale_sd = 1, var_sd = 1, log_var = 2.1507338488917742e-312, 
      scaled_var = 2.1507341429794092e-312, log_scaled_var = 2.1618233621127168e-312}, {scale = 1, shift = 0, 
      drift = 0, var = 1, scale_sd = 1, var_sd = 1, log_var = 2.2043752828955519e-312, 
      scaled_var = 2.2251384832222758e-312, log_scaled_var = 2.2251385808743507e-312}}, base_model = {0x1014c40, 
    0x0}, samples = std::vector of length 0, capacity 0, sample_rate = 4000, sample_start_time = 460801787334, 
  events_per_base = {0, 0}, base_to_event_map = std::vector of length 0, capacity 0, parameters = {{
      trans_m_to_e_not_k = 2.3166560076338034e-312, trans_e_to_e = 2.3166561362193284e-312, 
      trans_start_to_clip = 0.5, trans_clip_self = 0.89999997615814209, is_initialized = false, 
      skip_probabilities = std::vector of length 30, capacity 30 = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}, skip_bin_width = 0.5, training_data = {n_matches = 0, 
        n_merges = 0, n_skips = 0, kmer_transitions = std::vector of length 0, capacity 0, state_transitions = {
          cells = 0x107e2b0, n_rows = 3, n_cols = 6}}}, {trans_m_to_e_not_k = 2.3970558041533228e-312, 
      trans_e_to_e = 2.4073024472816041e-312, trans_start_to_clip = 0.5, trans_clip_self = 0.89999997615814209, 
      is_initialized = false, skip_probabilities = std::vector of length 30, capacity 30 = {0, 0, 0, 0, 0, 0, 0, 0, 
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}, skip_bin_width = 0.5, training_data = {
        n_matches = 0, n_merges = 0, n_skips = 0, kmer_transitions = std::vector of length 0, capacity 0, 
        state_transitions = {cells = 0xea1390, n_rows = 3, n_cols = 6}}}}, f_p = 0x0, basecall_group = ""}
jts commented 6 years ago

Great. Can you send me the fastq record for read f972d2ab-2e7d-437d-8a19-e6c7870f3dae and the fast5 file /fast5/GXB01121_20171220__GA10000_mux_scan_20_12_17_cftr_30318_read_126_ch_476_strand.fast5?

egeulgen commented 6 years ago

I couldn't extract the record so I'm sending the whole fastq file. You can find them here

jts commented 6 years ago

This file worked without problem for me. How much memory do you have available? Are you using the latest version of nanopolish (check with nanopolish --version)?

egeulgen commented 6 years ago

using nanopolish version 0.9.0. have 244 GiB Memory. Upgrading to version 0.9.2 now.

egeulgen commented 6 years ago

It did work after updating. Thank you for your help!

yiyi11zhong commented 5 years ago

hello, i got the same error too when i tried to test the example dataset provided by nanopolish.

$ /store/tools/nanopolish/nanopolish variants --consensus -o polished.vcf -w "tig00000001:200000-202000" -r reads.fasta -b reads.sorted.bam -g draft.fa

Error:nanopolish: src/nanopolish_squiggle_read.cpp:321: void SquiggleRead::load_from_raw(hid_t, uint32_t): Assertion `rt.n > 0' failed. Aborted (core dumped)

FYI, i use nanopolish 0.10.2, and my computer has 16 GiB ram.

Any idea?

jts commented 5 years ago

Can you confirm you are using the right version by running this command and pasting the results:

/store/tools/nanopolish/nanopolish --version
yiyi11zhong commented 5 years ago

Confirmed, version 0.10.2

$ /store/tools/nanopolish/nanopolish --version

nanopolish version 0.10.2 Written by Jared Simpson.

lfaino commented 5 years ago

Hi All, I have the same issue with my data. any idea on the problem?

Cheers Luigi

lfaino commented 5 years ago

the complete error

nanopolish: src/nanopolish_squiggle_read.cpp:321: void SquiggleRead::load_from_raw(hid_t, uint32_t): Assertionrt.n > 0' failed. HDF5-DIAG: Error detected in HDF5 (1.8.14) thread 140638184183552:

000: H5Dio.c line 173 in H5Dread(): can't read data

major: Dataset
minor: Read failed

001: H5Dio.c line 550 in H5D__read(): can't read data

major: Dataset
minor: Read failed

002: H5Dchunk.c line 1872 in H5D__chunk_read(): unable to read raw data chunk

major: Low-level I/O
minor: Read failed

003: H5Dchunk.c line 2902 in H5D__chunk_lock(): data pipeline read failed

major: Data filters
minor: Filter operation failed

004: H5Z.c line 1357 in H5Z_pipeline(): required filter 'deflate' is not registered

major: Data filters
minor: Read failed

005: H5PL.c line 298 in H5PL_load(): search in paths failed

major: Plugin for dynamically loaded library
minor: Can't get value

006: H5PL.c line 402 in H5PL__find(): can't open directory

major: Plugin for dynamically loaded library
minor: Can't open directory or file`

my version is 0.10.2

jts commented 5 years ago

Did you install nanopolish from conda or by compiling from github?

lfaino commented 5 years ago

I installed via github. I tested the Dockerfile as weel but i got an error

Cheers Luigi

lfaino commented 5 years ago

I fixed the problem. I removed all, installed the two python library and i reinstalled nanopolish.

now it works fine

cheers Luigi

jts commented 5 years ago

Hm, interesting! I wonder what the problem was the first time. Glad it is working now!

yiyi11zhong commented 5 years ago

I fixed the problem. I removed all, installed the two python library and i reinstalled nanopolish.

now it works fine

cheers Luigi

Yeah, reinstalling seems works. Thanks!

lfaino commented 5 years ago

Hi jts, i would suggest that you fix the Dockerfile. When i had the issue, i tried to use the docker image but i got an error. in your Dockerfile is not installed biopython which it is required to run the parallel version of nanopolish

Cheers Luigi

valery-shap commented 5 years ago

Hello all, I have the same error. When I try to use only variants: HDF5-DIAG: Error detected in HDF5 (1.8.14) thread 139851905382144:

000: H5Dio.c line 173 in H5Dread(): can't read data

major: Dataset
minor: Read failed

001: H5Dio.c line 550 in H5D__read(): can't read data

major: Dataset
minor: Read failed

002: H5Dchunk.c line 1872 in H5D__chunk_read(): unable to read raw data chunk

major: Low-level I/O
minor: Read failed

003: H5Dchunk.c line 2902 in H5D__chunk_lock(): data pipeline read failed

major: Data filters
minor: Filter operation failed

004: H5Z.c line 1357 in H5Z_pipeline(): required filter 'deflate' is not registered

major: Data filters
minor: Read failed

005: H5PL.c line 298 in H5PL_load(): search in paths failed

major: Plugin for dynamically loaded library
minor: Can't get value

006: H5PL.c line 402 in H5PL__find(): can't open directory

major: Plugin for dynamically loaded library
minor: Can't open directory or file

nanopolish: src/nanopolish_squiggle_read.cpp:321: void SquiggleRead::load_from_raw(fast5_file&, uint32_t): Assertion `rt.n > 0' failed. Aborted (core dumped)

When I try to use with parallel and consensus: HDF5-DIAG: Error detected in HDF5 (1.8.14) thread 140642788947712:

000: H5Dio.c line 173 in H5Dread(): can't read data

major: Dataset
minor: Read failed

001: H5Dio.c line 550 in H5D__read(): can't read data

major: Dataset
minor: Read failed

002: H5Dchunk.c line 1872 in H5D__chunk_read(): unable to read raw data chunk

major: Low-level I/O
minor: Read failed

003: H5Dchunk.c line 2902 in H5D__chunk_lock(): data pipeline read failed

major: Data filters
minor: Filter operation failed

004: H5Z.c line 1357 in H5Z_pipeline(): required filter 'deflate' is not registered

major: Data filters
minor: Read failed

005: H5PL.c line 298 in H5PL_load(): search in paths failed

major: Plugin for dynamically loaded library
minor: Can't get value

006: H5PL.c line 402 in H5PL__find(): can't open directory

major: Plugin for dynamically loaded library
minor: Can't open directory or file

nanopolish: src/nanopolish_squiggle_read.cpp:321: void SquiggleRead::load_from_raw(fast5_file&, uint32_t): Assertion `rt.n > 0' failed.

There is no 'Aborted (core dumped)'

nanopolish version 0.11.0 Requirement already satisfied: biopython in /miniconda2/lib/python2.7/site-packages (1.73) Requirement already satisfied: numpy miniconda2/lib/python2.7/site-packages (from biopython) (1.16.0)

A lot of thanks, Valery

hasindu2008 commented 5 years ago

@valery-shap Once I got a similar error (not in miniconda though - in direct compilation). The problem was that the zlib + zlib dev library did not exist when building the HDF5 library, thus HDF5 was compiled with no support on deflate - compression. It was fixed when I installed zlib and zlib dev libraries and recompiled again. Not sure if this is applicable to your case.

valery-shap commented 5 years ago

Thank you very much for the answer! I remember that i couldn.t make nanopolish with error about zlib. I install this library and make again with it and all was fine. Yesterday i installed the old version v.9.02 and run the variants process. It doesn't show the error but the process is still running and i have no information like log file. Just all cpu% is for nanopolish. i have 24 threads and 4.6 Mbp genome. Do you know is it normal time for counting?