vpc-ccg / haslr

A fast tool for hybrid genome assembly of long and short reads
GNU General Public License v3.0
74 stars 9 forks source link

Possible to lower memory usage for haslr_assemble #11

Open jelber2 opened 4 years ago

jelber2 commented 4 years ago

Hi,

I have a single node Ubuntu 16.0.4 system with 378 GB RAM and 40 cores (80 threads). During the haslr assemble stage, memory usage jumps to 100 % and haslr_assemble starts using Swap, so I dropped --cov-lr from 25 to 20 to 15 and now to 10. The genome is 450 Mbp with ~ 30x PacBio CLR (simulated reads) and ~60x Illumina short-reads (simulated). I will see if the the --cov-lr 10 setting works on my system (i.e., it doesn't use too much RAM before completion), but I was wondering if there might be some way to minimize RAM usage during this step. Any ideas?

Update: --cov-lr 10 ran out of memory as well, so I am playing around with --aln-block and --aln-sim settings (was using defaults)

jelber2 commented 4 years ago

I got the farthest with this command

/genetics/elbers/haslr/bin/haslr.py --aln-block 1000 --aln-sim 0.90 --genome 450m --long pacbio.fastq.gz --type pacbio --short small_insert_trim1.fq.gz small_insert_trim2.fq.gz --threads 75 --cov-lr 10 --out haslr-pacbio-clr > haslr.pacbio-clr.log 2>&1 &

But, after a little while I got the same error as #5 , even though I was only using 75% RAM at most.