soedinglab / hh-suite

Remote protein homology detection suite.
https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-019-3019-7
GNU General Public License v3.0
515 stars 128 forks source link

running hhblist for 7000 sequences - memory leak? #314

Open ksteczk opened 2 years ago

ksteczk commented 2 years ago

I'm running hhblits for over 7000 sequences on Uniref DB. When using hhblist_omp using 128 threads the process gets killed after reaching ~600 sequences after it consumes the whole RAM memory (the machine has 256GB). As I observe the RAM consumption increases slowly with time as some processes did not clean the memory after dumping the alignment to output an a3m file (a bug?). The command hhblits_omp -i indb -d UniRef30_2020_06 -oa3m out_a3m -n 5 -cpu 128 I can split the file into ~500-sequence chunks but that is not efficient as hhblits_omp would not balance it optimally. mpirun also seems to consume a lot of RAM as the threads seem not to share common database files in memory.

Any suggestions on how to make profiles for thousands of sequences at once?