ocxtal / minialign

[IMPORTANT: not for real data analysis, only for algorithm evaluation] fast and accurate alignment tool for PacBio and Nanopore long reads
MIT License
126 stars 9 forks source link

Minialign using large volumes of memory and crashing out due to a segmentation fault #12

Open Jacob-Ferrier opened 5 years ago

Jacob-Ferrier commented 5 years ago

Hello,

I am attempting to use minialign to align subreads against a reference fasta. I am having two issues:

Issue 1: I have generated the index using the following command (where REFERENCE points to the reference fasta):

/users/jferrier/software/minialign/minialign -t 64 -d index.mai ${REFERENCE}

The process was given 4TB of memory as well as a walltime of 256 hours. The process succeed however it used approximately 3.7TB of memory. The reference is approximately 415GB. The resultant index is approximately 438GB. Do you know what is using so much memory?

Issue 2: After successfully generating the index I have attempted to align some reads against the index using the following command (where READS points to the reads fastq):

/users/jferrier/software/minialign/minialign -t 64 -v 2 index.mai ${READS} > minialign_test_alignment.sam

This process was given 2TB of memory as well as a walltime of 256 hours. This process failed and all that was written out was the following:

[M::main] Version: 0.6.0-44-g5fd40a5, Build: AVX2 [M::main_align::1288.965*7.19] loaded/built index for 41818118 target sequence(s). /var/spool/torque/mom_priv/jobs/1183618.cph-m1.uncc.edu.SC: line 44: 136454 Segmentation fault (core dumped) /users/jferrier/software/minialign/minialign -t 64 -v 2 index.mai ${READS} > minialign_test_alignment.sam

The process used approximately 1.16TB of memory before crashing out in approximately 2.5 hours. As I mentioned before the index.mai file is approximately 438GB. The reads file is approximately 795MB. The process does generate minialign_test_alignment.sam with is approximately 3.0GB however the process does not finish. Do you know why the process is crashing out?

This very well could be user error, please let me know if you have any insight or if I could provide any other information to help diagnose. Thank you very much!

ocxtal commented 5 years ago

It seems natural that minialign consumes ~4TB memory for 415GB reference (if it's not compressed; ~415Gbp). It depends on k and w, but avarage index footprint is around 11-13 times of the total reference sequence length in bytes. Larger w will help you decrease the footprint.

If it doesn't run when the reference is halved, it might be a bug. Then please let me know the log printed on your terminal, and input sequences if possible.

Thanks,