lh3 / miniasm

Ultrafast de novo assembly for long noisy reads (though having no consensus step)
MIT License
303 stars 69 forks source link

Segmentation fault #36

Open wangzhennan14 opened 6 years ago

wangzhennan14 commented 6 years ago

Hi Liheng, When I use miniasm to assembly an 100x Pacbio genome, there was an error as follow: Segmentation fault What is the matter? I followed the mannual, and the logs were:

 [M::main] ===> Step 1: reading read mappings <===
[M::ma_hit_read::40487.179*0.93] read 11945248497 hits; stored 12916879108 hits and 10398788 sequences (101800136514 bp)
[M::main] ===> Step 2: 1-pass (crude) read selection <===
[M::ma_hit_sub::45821.251*0.94] 9975919 query sequences remain after sub
[M::ma_hit_cut::48542.296*0.94] 11845662951 hits remain after cut
[M::ma_hit_flt::50820.915*0.94] 7422102329 hits remain after filtering; crude coverage after filtering: 476.06
[M::main] ===> Step 3: 2-pass (fine) read selection <===
[M::ma_hit_sub::51346.118*0.94] 9907831 query sequences remain after sub
[M::ma_hit_cut::52976.403*0.95] 7118498960 hits remain after cut
[M::ma_hit_chimeric::53791.988*0.95] identified 474580 chimeric reads

Can you give me some advice to solve this problems? Thank you very much!

lh3 commented 6 years ago

Could you check if you have enough memory?

pbfrandsen commented 6 years ago

Hello,

Thank you for the great tool. I am also getting a seg fault.

[M::main] ===> Step 1: reading read mappings <===
/opt/gridengine/default/spool/compute-9-3/job_scripts/1347083: line 19: 163580 Segmentation fault      (core dumped) ../miniasm/miniasm -f /scratch/genomics02/frandsenp/concat_reads/all_reads.fasta.gz reads.paf.gz > reads.gfa

At first, I thought it was because of memory, then I allocated 250GB and it segfaults when it is using ~130GB (and the resulting core dump is about 129 GB). I am now running it with 500GB of RAM, just in case, but if you have any insight on what I might adjust in the meantime, I am very open to it.

Thank you,

Paul

sjackman commented 6 years ago

I'm also seeing a segmentation fault in Step 3. It's using 250 GB of RAM, and the machine has 2.5 TB of RAM, so memory usage should be okay. This assembly succeeded with miniasm -c2, but failed with miniasm -c3. I'll try it once more.

❯❯❯ miniasm -c3 -f Q903_11.fq.gz Q903_11.minimap2.paf.gz >Q903_11.minimap2.c3.miniasm.gfa
[M::main] ===> Step 1: reading read mappings <===
[M::ma_hit_read::33696.481*0.98] read 6043539395 hits; stored 8042486383 hits and 4275859 sequences (53639474019 bp)
[M::main] ===> Step 2: 1-pass (crude) read selection <===
[M::ma_hit_sub::51174.764*0.99] 3888964 query sequences remain after sub
[M::ma_hit_cut::59938.663*0.99] 6138438897 hits remain after cut
[M::ma_hit_flt::60340.037*0.99] 2636615608 hits remain after filtering; crude coverage after filtering: 445.30
[M::main] ===> Step 3: 2-pass (fine) read selection <===
[M::ma_hit_sub::60693.790*0.99] 3681650 query sequences remain after sub
[M::ma_hit_cut::60946.122*0.99] 2314586291 hits remain after cut
While building Q903_11.minimap2.c3.miniasm.gfa: Error 139 executing command time -v -o Q903_11.minimap2.c3.miniasm.gfa.time miniasm -c3 -f Q903_11.fq.gz Q903_11.minimap2.paf.gz >Q903_11.minimap2.c3.miniasm.gfa
Deleting Q903_11.minimap2.c3.miniasm.gfa
Command exited with non-zero status 2
11527.35user 49308.91system 17:29:40elapsed 96%CPU (0avgtext+0avgdata 251763444maxresident)k
253003824inputs+88outputs (0major+1440637832minor)pagefaults 0swaps

Command terminated by signal 11
    User time (seconds): 11524.88
    System time (seconds): 49308.86
    Percent of CPU this job got: 96%
    Elapsed (wall clock) time (h:mm:ss or m:ss): 17:29:37
    Maximum resident set size (kbytes): 251763444
lh3 commented 6 years ago

Do you have the log file for -c2? BTW, using -Rc2 usually uses less memory at the cost of performance. Sometimes -Rc2 may give better assembly than -c3.

sjackman commented 6 years ago

I ran miniasm -c3 a second time, and I saw the same log and segmentation fault, so it seems repeatable.

Here's the log for miniasm -c2

❯❯❯ miniasm -c2 -f Q903_11.fq.gz Q903_11.minimap2.paf.gz >Q903_11.minimap2.c2.miniasm.gfa
[M::main] ===> Step 1: reading read mappings <===
[M::ma_hit_read::24348.043*0.99] read 6043539395 hits; stored 8042486383 hits and 4275859 sequences (53639474019 bp)
[M::main] ===> Step 2: 1-pass (crude) read selection <===
[M::ma_hit_sub::41732.276*0.99] 4042833 query sequences remain after sub
[M::ma_hit_cut::44159.173*0.99] 6725880000 hits remain after cut
[M::ma_hit_flt::45356.645*0.99] 2252966414 hits remain after filtering; crude coverage after filtering: 357.92
[M::main] ===> Step 3: 2-pass (fine) read selection <===
[M::ma_hit_sub::45694.031*0.99] 3926323 query sequences remain after sub
[M::ma_hit_cut::46010.082*0.99] 1976194002 hits remain after cut
[M::ma_hit_contained::47574.346*0.99] 679121 sequences and 30233329 hits remain after containment removal
[M::main] ===> Step 4: graph cleaning <===
[M::ma_sg_gen] read 14964141 arcs
[M::main] ===> Step 4.1: transitive reduction <===
[M::asg_arc_del_trans] transitively reduced 2690476 arcs
[M::asg_arc_del_multi] removed 39354 multi-arcs
[M::asg_arc_del_asymm] removed 208039 asymmetric arcs
[M::main] ===> Step 4.2: initial tip cutting and bubble popping <===
[M::asg_cut_tip] cut 342548 tips
[M::asg_pop_bubble] popped 702 bubbles and trimmed 410 tips
[M::main] ===> Step 4.3: cutting short overlaps (3 rounds in total) <===
[M::asg_arc_del_multi] removed 0 multi-arcs
[M::asg_arc_del_asymm] removed 751999 asymmetric arcs
[M::asg_arc_del_short] removed 1752535 short overlaps
[M::asg_cut_tip] cut 61021 tips
[M::asg_pop_bubble] popped 701 bubbles and trimmed 326 tips
[M::asg_arc_del_multi] removed 0 multi-arcs
[M::asg_arc_del_asymm] removed 139551 asymmetric arcs
[M::asg_arc_del_short] removed 179583 short overlaps
[M::asg_cut_tip] cut 15799 tips
[M::asg_pop_bubble] popped 338 bubbles and trimmed 169 tips
[M::asg_arc_del_multi] removed 0 multi-arcs
[M::asg_arc_del_asymm] removed 83251 asymmetric arcs
[M::asg_arc_del_short] removed 105729 short overlaps
[M::asg_cut_tip] cut 9249 tips
[M::asg_pop_bubble] popped 280 bubbles and trimmed 182 tips
[M::main] ===> Step 4.4: removing short internal sequences and bi-loops <===
[M::asg_cut_internal] cut 2743 internal sequences
[M::asg_cut_biloop] cut 15947 small bi-loops
[M::asg_cut_tip] cut 1398 tips
[M::asg_pop_bubble] popped 31 bubbles and trimmed 20 tips
[M::main] ===> Step 4.5: aggressively cutting short overlaps <===
[M::asg_arc_del_multi] removed 0 multi-arcs
[M::asg_arc_del_asymm] removed 45288 asymmetric arcs
[M::asg_arc_del_short] removed 57436 short overlaps
[M::asg_cut_tip] cut 5713 tips
[M::asg_pop_bubble] popped 195 bubbles and trimmed 161 tips
[M::main] ===> Step 5: generating unitigs <===
[M::main] Version: 0.2-r128
[M::main] CMD: miniasm -c2 -f Q903_11.fq.gz Q903_11.minimap2.paf.gz
[M::main] Real time: 48352.940 sec; CPU: 48032.413 sec
    Command being timed: "miniasm -c2 -f Q903_11.fq.gz Q903_11.minimap2.paf.gz"
    User time (seconds): 12570.57
    System time (seconds): 35462.22
    Percent of CPU this job got: 99%
    Elapsed (wall clock) time (h:mm:ss or m:ss): 13:25:55
    Maximum resident set size (kbytes): 251763388

Thanks for the tip about miniasm -Rc2. I'll try it out.

lh3 commented 6 years ago

With -c2:

[M::main] ===> Step 3: 2-pass (fine) read selection <===
[M::ma_hit_sub::45694.031*0.99] 3926323 query sequences remain after sub
[M::ma_hit_cut::46010.082*0.99] 1976194002 hits remain after cut

With -c3:

[M::main] ===> Step 3: 2-pass (fine) read selection <===
[M::ma_hit_sub::60693.790*0.99] 3681650 query sequences remain after sub
[M::ma_hit_cut::60946.122*0.99] 2314586291 hits remain after cut

Note the difference between 1976194002 vs 2314586291. I haven't checked the source code, but I guess -c3 failed because the containment removal part might be using 31-bit integers somewhere.

sjackman commented 6 years ago

Those pesky signed 31-bit integers. =) I tend to use size_t for unsigned counters, or ssize_t if you want it to be signed for some reason. Gives you 64-bit counters without having to resort to uint64_t.

bioteksampath commented 6 years ago

Me too https://github.com/lh3/miniasm/issues/36.