hzi-bifo / Haploflow

GNU General Public License v3.0
25 stars 3 forks source link

Possible to restrict RAM use with haploflow? #18

Open chrisgab opened 1 year ago

chrisgab commented 1 year ago

Hi, I have recently tested haploflow on a complex metagenomics dataset, and it seems to be performing very well compared to other tools in producing correctly assembled viral contigs. For testing, I have used half of my dataset (i.e. only forward reads), with an uncompressed file size of 8 GB, and this has run without issues. I have used the conda installation, and am running this on a Linux system with 250 GB RAM. However, when I try to use the full dataset (16 GB), the RAM usage increases until it is maxed out and the program eventually crashes. Is there any way to control the memory use to avoid these issues?

AlphaSquad commented 1 year ago

Hi, unfortunately there currently is no way to control memory manually and the deBruijn graph implementation is not optimised for large metagenomic datasets (it scales with the number of different k-mers). It is on my list to improve the memory behaviour though and maybe there really is a bug/memory leak somewhere, since I don't expect the reverse reads to add so many new k-mers - I will investigate.

chrisgab commented 1 year ago

Thank you for your quick response. Very interested to hear what you find, and if a future version will implement more control over memory usage.