lh3 / miniasm

Ultrafast de novo assembly for long noisy reads (though having no consensus step)
MIT License
303 stars 68 forks source link

miniasm segmentation fault #48

Open imneuro opened 6 years ago

imneuro commented 6 years ago

Hi LiHeng,

I use miniasm and run into segmentation fault. I reproduced the error. Both times it only used 52% of the total 239G memory before it fail. The ovlp_corn.paf.gz is 511GB and reads.fa is 43GB raw Pacbio long-read for about 20X coverage with genome size about 2.2Gb. Please note the second time when I repeat the error, I only did miniasm step. Is there anything I missed? What it can go wrong?

$ tail log-0605.out [M::worker_pipeline::39280.93813.59] mapped 64199 sequences [M::worker_pipeline::39324.00913.58] mapped 69836 sequences [M::worker_pipeline::39356.45913.58] mapped 70597 sequences [M::worker_pipeline::39368.77713.58] mapped 73223 sequences [M::worker_pipeline::39369.057*13.58] mapped 10406 sequences [M::main] Version: 2.10-r761 [M::main] CMD: /opt/conda/bin/minimap2 -x ava-pb -t 20 reads.fa reads.fa [M::main] Real time: 39370.033 sec; CPU: 534467.581 sec [M::main] ===> Step 1: reading read mappings <=== miniasm_cluster_corn.sh: line 20: 22168 Segmentation fault (core dumped) /opt/conda/bin/miniasm -f reads.fa ovlp_corn.paf.gz > reads_corn.gfa

The repeated log as following: Wed Jun 6 15:42:21 UTC 2018 /opt/conda/bin/miniasm -f reads.fa ovlp_corn.paf.gz > reads_corn.gfa Wed Jun 6 18:50:23 UTC 2018

The repeated error as following: ==> log_0606_2.out <== [M::main] ===> Step 1: reading read mappings <=== miniasm_cluster_corn.sh: line 20: 27 Segmentation fault (core dumped) /opt/conda/bin/miniasm -f reads.fa ovlp_corn.paf.gz > reads_corn.gfa

I hope to hear from you soon. Best, Xin

imneuro commented 6 years ago

Forgot to mention the version of miniasm is 0.2-r128

imneuro commented 6 years ago

Update:

I used the new version 0.3_r179. It seem like when data use small memory, such as 80GB, it works fine. However, when data needs big memory, it still fail with segmentation fault. And I was able to repeat the error and noticed that both time the error happened when memory usage reach to about 120GB, yet I have total 240GB memory.

...... [M::main] CMD: minimap2 -x ava-pb -t 28 /efs/fasta/merged_BAC_PACBIO.filtered.fasta /efs/fasta/merged_BAC_PACBIO.filtered.fasta [M::main] Real time: 30832.901 sec; CPU: 674449.293 sec 31699 Segmentation fault (core dumped) /home/biodocker/miniasm//miniasm -f /efs/fasta/merged_BAC_PACBIO.filtered.fasta /efs/fasta//miniasm/BAC_corn.paf.gz -Rc2 > /efs/fasta/miniasm/BAC_corn.gfa 2> /efs/fasta//miniasm/step2.err

Is there any other way to decrease the memory useage?

Best, Xin

ManavalanG commented 5 years ago

I run into same error when using miniasm with large datasets. In case, it consistently fails around 256GB memory beyond a certain file size.

dcopetti commented 5 years ago

same here, I get the Segmentation fault error when the process uses about half of the total memory (in my case ~65 GB out of 126). My version is 0.3-r179 as well.

fbemm commented 5 years ago

@lh3 does miniasm have a upper limitation of what it can actually read? I am trying to assemble a gzipped 4Tb PAF file and it looks there is a hard cap at around 2Tb memory usage.

mr--white commented 5 years ago

We are experiencing this as well. we have 1TB of RAM and the segfault consistently occurs around the halfway point like others here; which is around of 550GB used for us. We also use a gzipped paf file. We were using 0.2-r168-dirty and updated to the current release and the issue still occurs.

fbemm commented 5 years ago

@mr--white set -m to something between 1000-2000 during your minimap2 overlap. Resulting PAF file should much smaller and miniasm run through.

bnwaweru commented 5 years ago

am having the same problem anyone tried out what at @mr--white suggested to see if it worked? am working with a 734 GB .paf.gz file, wouldnt want to repeat this step if it doesent work.