mikolmogorov / Flye

De novo assembler for single molecule sequencing reads using repeat graphs
Other
791 stars 168 forks source link

Error in assemble binary on ONP data #17

Closed ghost closed 7 years ago

ghost commented 7 years ago

I have:

I get the following error: [2017-09-19 15:37:47] INFO: Extending reads 0% 10% 20% [2017-09-19 17:48:35] ERROR: Error: Error in assemble binary: Command '['abruijn-assemble', '-k', '15', '-l', '/home/user/Documents/Abruijn_ko_fz/out/abruijn.log', '-t', '11', '-v', '5000', '/home/user/bioinf_archive/32_scmi_storage/onp/ko_onp_FZ1/extracted/twoBestMin30.fasta', '/home/user/Documents/Abruijn_ko_fz/out/draft_assembly.fasta', '150']' returned non-zero exit status -9

I don't think that I run out of discspace/memory.

What went wrong?

Here the end of the log file:


    With 11 reads
    Start read: -2be4669b-93c2-4154-aa4b-625728fa7d06_runid=2b076ac8f6a448e848698ae57d8581ac75fc0637_read=7487_ch=298_start_time=2017-09-13T02:49:06Z_.poretools_tmp/20170912_1617_qc/fast5/pass/36/fz_i_177_20170912_fah18372_MN15037_sequencing_run_qc_40637_read_7487_ch_298_strand.fast5
    At position: 10
    leftTip: 0 rightTip: 0
    Suspicios: 0
    Mean overlaps: 256
    Inner reads: 10
[2017-09-19 15:40:10] DEBUG: Inner: 30804 covered: 42884 total: 55124
[2017-09-19 15:40:10] DEBUG: Discarded contig with 17 reads and 16 inner overlaps
[2017-09-19 15:40:10] DEBUG: Discarded contig with 13 reads and 12 inner overlaps
[2017-09-19 17:48:35] root: ERROR: Error: Error in assemble binary: Command '['abruijn-assemble', '-k', '15', '-l', '/home/user/Documents/Abruijn_ko_fz/out/abruijn.log', '-t', '11', '-v', '5000', '/home/user/archive/storage/onp/ko_onp_FZ1/extracted/twoBestMin30.fasta', '/home/user/Documents/Abruijn_ko_fz/out/draft_assembly.fasta', '150']' returned non-zero exit status -9```
mikolmogorov commented 7 years ago

Hi,

This is most likely some bug that lead to a segfault. I've prepared a new version that should output stack backtrace into the log file - can you try to run the last version from the "devel" branch? This would help to locate where the error is.

ghost commented 7 years ago

This is the end of the log file for the same data where it crashed again (using most recent "devel" branch):

    Mean overlaps: 322
    Inner reads: 7
[2017-09-20 09:18:19] DEBUG: Inner: 30540 covered: 42726 total: 55124
[2017-09-20 09:18:19] DEBUG: Discarded contig with 13 reads and 11 inner overlaps
[2017-09-20 09:18:19] DEBUG: Assembled contig
    With 12 reads
    Start read: +779ea45c-f3c9-4e45-aacb-c1d377af2df4_runid=2b076ac8f6a448e848698ae57d8581ac75fc0637_read=6710_ch=452_start_time=2017-09-12T23:43:17Z_.poretools_tmp/20170912_1617_qc/fast5/pass/28/fgcz_i_177_20170912_fah18372_MN15037_sequencing_run_qc_40637_read_6710_ch_452_strand.fast5
    At position: 11
    leftTip: 0 rightTip: 0
    Suspicios: 0
    Mean overlaps: 328
    Inner reads: 10
[2017-09-20 09:18:20] DEBUG: Inner: 30656 covered: 42782 total: 55124
[2017-09-20 09:18:20] DEBUG: Discarded contig with 13 reads and 13 inner overlaps
[2017-09-20 13:04:59] root: ERROR: Error: Error in assemble binary: Command '['abruijn-assemble', '-k', '15', '-l', '/home/user/Documents/Abruijn_kr_fz/outDevel/abruijn.log', '-t', '12', '-v', '5000', '/home/user/bioinf_archive/storage/onp/kr_onp_FZ1/extracted/twoBestMin30.fasta', '/home/user/Documents/Abruijn_kr_fz/outDevel/draft_assembly.fasta', '150']' returned non-zero exit status -9

Here the header of the log file if helpful:

[2017-09-20 09:06:57] root: INFO: Running ABruijn
[2017-09-20 09:06:58] root: DEBUG: Estimated genome size: 8179615
[2017-09-20 09:06:58] root: DEBUG: Chosen k-mer size: 15
[2017-09-20 09:06:58] root: INFO: Assembling reads
[2017-09-20 09:06:58] root: DEBUG: -----Begin assembly log------
[2017-09-20 09:06:58] DEBUG: Build date: Sep 20 2017 09:05:51
[2017-09-20 09:06:58] INFO: Reading FASTA
[2017-09-20 09:07:29] DEBUG: Mean read length: 43697
[2017-09-20 09:07:29] INFO: Generating solid k-mer index
[2017-09-20 09:07:29] DEBUG: Hard threshold set to 15
[2017-09-20 09:07:29] DEBUG: Started kmer counting
[2017-09-20 09:07:30] INFO: Counting kmers (1/2):
[2017-09-20 09:11:28] INFO: Counting kmers (2/2):
[2017-09-20 09:12:08] DEBUG: Genome size estimate: 8001846
[2017-09-20 09:12:08] DEBUG: Filtered 308 repetitive kmers
[2017-09-20 09:12:08] DEBUG: Estimated minimum kmer coverage: 19, 7900903 unique kmers selected
[2017-09-20 09:12:08] INFO: Filling index table
[2017-09-20 09:12:09] DEBUG: Kmer index size: 269974833
[2017-09-20 09:14:43] INFO: Extending reads
[2017-09-20 09:15:47] DEBUG: Mean read coverage: 87
[2017-09-20 09:17:53] DEBUG: Assembled contig
    With 264 reads
    Start read: -376f3b3c-ce00-4a28-bd80-f1b8b7698c57_runid=2b076ac8f6a448e848698ae57d8581ac75fc0637_read=1689_ch=20_start_time=2017-09-12T18:37:26Z_.poretools_tmp/20170912_1617_qc/fast5/pass/9/fgcz_i_177_20170912_fah18372_MN15037_sequencing_run_qc_40637_read_1689_ch_20_strand.fast5
    At position: 247
    leftTip: 0 rightTip: 0
    Suspicios: 0
    Mean overlaps: 160
    Inner reads: 0
mikolmogorov commented 7 years ago

Thank you, unfortunately I don't see segfault backtrace logs here. Are you sure that system did not run out of memory? (signal -9 most likely means that system killed the executable).

Can you give some more details about the system ('uname -a', 'gcc --version')? How much memory is there? Could you also try to run 'dmesg | grep killed', 'dmesg | grep memory' and 'cat /var/log/kernel.log | grep segfault' and see if there is any output?

ghost commented 7 years ago

Actually you might be right.

dmesg | grep memory gives: [2417644.147499] [<ffffffff81140c63>] ? out_of_memory+0x473/0x4b0 [2417644.147825] Out of memory: Kill process 9434 (abruijn-assembl) score 844 or sacrifice child I did not expect this since I have 100Gb of memory available. On the other hand, there was some other assembler running. So I give it another try without sharing ressources, let's see...

mikolmogorov commented 7 years ago

Interesting, 1gb of data should only require 4-8g of memory, but maybe there are some unforeseen consequences of having 100kbp+ reads. Let me know when you have a chance to take a closer look - I will be happy to help.

ghost commented 7 years ago

I am just observing the "Extending reads" step and it is eating up almost all of my 90Gb RAM. Not crashed yet... EDIT: OK, it crashed during "Extending reads" step. I run it with 20 cores. Trying with 10 cores now.

mikolmogorov commented 7 years ago

In the mean time you can try the latest devel version, theoretically it should fix the high memory consumption problem for very long reads...

ghost commented 7 years ago

OK, the latest devel version finished without any errors. I was running the same 1.1Gbp input with 18 cores and about 90Gb of RAM. Thanks!