lh3 / miniasm

Ultrafast de novo assembly for long noisy reads (though having no consensus step)
MIT License
299 stars 68 forks source link

Segmentation fault happened at step3 #7

Closed wen-biao closed 7 years ago

wen-biao commented 8 years ago

Hi Heng,

I have a problem while running the miniasm to assemble a plant genome. My genome is nearly 400MB. It was sequenced by PacBio with depth of 85x. Before running miniasm, I used the minimap to do the overlap. Do you have some suggestions to solve this problem?

Thanks, Wen-Biao

[M::main] ===> Step 1: reading read mappings <=== [M::ma_hit_read::10096.666_1.00] read 2876324180 hits; stored 4112957381 hits and 3335749 sequences (31132741397 bp) [M::main] ===> Step 2: 1-pass (crude) read selection <=== [M::ma_hit_sub::11325.809_1.00] 3223555 query sequences remain after sub [M::ma_hit_cut::11845.499_1.00] 3940583267 hits remain after cut [M::ma_hit_flt::12142.079_1.00] 2353809445 hits remain after filtering; crude coverage after filtering: 465.37 [M::main] ===> Step 3: 2-pass (fine) read selection <=== [M::ma_hit_sub::12272.969_1.00] 3203496 query sequences remain after sub [M::ma_hit_cut::12411.309_1.00] 2326599127 hits remain after cut Segmentation fault

lh3 commented 8 years ago

Could you check if you have enough RAM? Thanks.

wen-biao commented 8 years ago

We have 512Gb RAM, but I submit my jobs only requiring 48Gb using bsub. And here are some log information Exited with exit code 139. Resource usage summary: CPU time : 725882.19 sec. Max Memory : 126026 MB Max Swap : 131639 MB Max Processes : 5 Max Threads : 48

I will rerun it with more RAM.

Thanks.

wen-biao commented 8 years ago

The Segmentation fault happened again even though I allocated 256Gb memory. Maybe another cause?

lh3 commented 8 years ago

I don't know. I have to reproduce the segfault to fix it. Is this a public data set or can I debug on it?

wen-biao commented 8 years ago

Hi Heng,

thanks. Unfortunately, the data is not public now. I tested the miniasm using the public pacbio reads from Arabidopsis thaliana(Ler). It showed good performance and was greatly fast. Actually, I have already run the assembly successfully by using PBcR pipeline. So I am very curious whether miniasm can get better assembly results.

StefanoLonardi commented 7 years ago

I have encountered a similar problem. My genome is 620Mb, I have 84x coverage. I have run ./minimap/minimap -Sw5 -L100 -m0 -t32 reads.fasta.gz reads.fasta.gz | gzip -1 > reads.paf.gz and obtained a file reads.paf.gz which is 148,091,787,711 bytes. Then I have run ./miniasm/miniasm -f reads.fasta.gz reads.paf.gz > reads.gfa and it fails with the following message

[M::main] ===> Step 1: reading read mappings <===
[M::ma_hit_read::13678.392*0.99] read 4118197117 hits; stored 6246339858 hits and 4359024 sequences (49751590514 bp)
[M::main] ===> Step 2: 1-pass (crude) read selection <===
[M::ma_hit_sub::20164.461*0.99] 3981562 query sequences remain after sub
[M::ma_hit_cut::21492.442*0.99] 5457262256 hits remain after cut
[M::ma_hit_flt::23199.391*0.99] 2589838094 hits remain after filtering; crude coverage after filtering: 410.91
[M::main] ===> Step 3: 2-pass (fine) read selection <===
[M::ma_hit_sub::23309.968*0.99] 3910410 query sequences remain after sub
[M::ma_hit_cut::23765.825*0.99] 2285338707 hits remain after cut
[M::ma_hit_chimeric::24052.557*0.99] identified 390841 chimeric reads
Segmentation fault (core dumped)

My server has 512Gb of RAM, of which I can use it entirely. I also checked for duplicate reads with the command zcat reads.fasta.gz | perl -ne 'print "$1\n" if />(\S+)/' | sort | uniq -d, but there is none. Any suggestion?

nickloman commented 7 years ago

Hi Heng

Hope you are doing well.

We are trying to assemble our latest nanopore human genome set with miniasm, but encountering a coredump at the point of chimeric read detection:

It's about 30X coverage (~100Gb FASTA) but some of the reads are very long. The server has 1TB of RAM.

[Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". Core was generated by `/mnt/human/bin/miniasm/miniasm -f rel4a.fastq rel4a.paf.gz'. Program terminated with signal SIGSEGV, Segmentation fault.

0 ma_hit_mark_unused (d=d@entry=0x1a43010, n=n@entry=-1524641225, a=a@entry=0x7f5b928b5010) at hit.c:30

30 d->seq[a[i].qns>>32].aux = d->seq[a[i].tn].aux = 1; (gdb) bt

0 ma_hit_mark_unused (d=d@entry=0x1a43010, n=n@entry=-1524641225, a=a@entry=0x7f5b928b5010) at hit.c:30

1 0x00000000004081ac in ma_hit_contained (opt=opt@entry=0x7fff0d1567d0, d=d@entry=0x1a43010, sub=sub@entry=0x7fdb8da20010, n=, a=a@entry=0x7f5b928b5010) at hit.c:292

2 0x00000000004018bc in main (argc=4, argv=0x7fff0d156928) at main.c:138

Let me know if you would like the specific input files uploaded?

Best Nick

nickloman commented 7 years ago

Can confirm that proposed patch https://github.com/voutcn/miniasm/commit/b39d7577d30bca8f368370945995e576c00c358d solves this issue. Thanks!

lh3 commented 7 years ago

Thanks a million, @nickloman and @voutcn. Fix committed.