soedinglab / hh-suite

Remote protein homology detection suite.
https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-019-3019-7
GNU General Public License v3.0
515 stars 128 forks source link

memory usage of hhblits_omp #238

Open rujpeng opened 3 years ago

rujpeng commented 3 years ago

Dear HHsuite developers and users,

I am using hhblits_omp to search many sequences (above 10,000) against custom database. The searching went well at the beginning. But the memory usage becomes higher and higher and the program gradually ended up with segmentation fault. I am wondering if you have the same situation or that I have not compiled HHsuite properly?

A typical output from my case was like following: slurm_script: line 8: 123689 Segmentation fault hhblits_omp -cpu 4 -id 100 -maxfilt 30000 -diff 3000 -e 0.01 -cov 15 -qid 15 -i sleb -d ../searchMsa

Best, Junhui

milot-mirdita commented 3 years ago

Would it be possible to upload the query and database somewhere?

rujpeng commented 3 years ago

Thanks! The target database was not very large but still very difficult to upload it.

However, I tried to search my query to Uniref30_2020_06 using my query database. It was the same case, the program ended up with segmentation fault.

The query database can be downloaded here: https://doi.org/10.6084/m9.figshare.13540895

ksteczk commented 3 years ago

I upvote this issue. I (or rather my server) experienced the same problem. I'm computing profiles for standard COG/KOG database (starting for single sequences). My command is: hhblits_omp -i cdd -d /db/hh/UniRef30_2020_06 -oa3m cdd_a3m -n 3 -cpu 120 -v 0 This db has 9696 sequences.

It starts with several gigs of RAM used, but it increases with time and processed entries to use my all 256GB RAM when reaching something around 4000 entries. It seems like memory leak - maybe hhblits_omp doesn't purge data structures from memory after writing the results into resulting ff{data,index} files? When I CTRL+C the computation, make new indexfile with unprocessed entries and continue for them the situation is the same - starting with few GB of RAM used and increasing with entries processed.

My machine has 128 threads (AMD EPYC 7702P), 256GB RAM, runs Debian GNU/Linux 10 (buster) very stably.

ksteczk commented 3 years ago

Thanks! The target database was not very large but still very difficult to upload it.

However, I tried to search my query to Uniref30_2020_06 using my query database. It was the same case, the program ended up with segmentation fault.

The query database can be downloaded here: https://doi.org/10.6084/m9.figshare.13540895

Junhui, Making over 30'000 searches onto uniref30 using 4 CPUs will take ages. Did you manage to run that search successfully? Kamil

rujpeng commented 3 years ago

Thanks! The target database was not very large but still very difficult to upload it. However, I tried to search my query to Uniref30_2020_06 using my query database. It was the same case, the program ended up with segmentation fault. The query database can be downloaded here: https://doi.org/10.6084/m9.figshare.13540895

Junhui, Making over 30'000 searches onto uniref30 using 4 CPUs will take ages. Did you manage to run that search successfully? Kamil

Thanks Kamil,

Not yet with hhblits. I used another HMM based method, jackhmmer, instead, although it might be slow but at least runnable. I think maybe hhblits developers are now working on it.

For uniref30, I was using it only in case that people cannot download my database and cannot reproduce my results. My database contains several thousands of sequences and I guess hhblits can do it very fast.

Junhui

milot-mirdita commented 3 years ago

I've also noticed issues with hhpred_omp but didn't have time to investigate what's going wrong. As a workaround you can use a script similar to this: https://github.com/soedinglab/hhdatabase_cif70/blob/master/pdb70_hhblits_lock.sh To repeatedly call hhblits based on an input database and to produce an output database again.

milot-mirdita commented 3 years ago

I think I fixed one performance issue with a lot of cores in hhblits_omp here: https://github.com/soedinglab/hh-suite/commit/e1bd3a124ba9896dfccc6d774bb47fa1ad3ba2f3

elevywis commented 7 months ago

Hi, running hhblitp_omp on UniRef30 and seeing the same issue. If running a batch of 100 sequences with 50 threads, the first 30 will complete quickly and it'll slow down progressively with all 128GB RAM being progressively used up and upwards of 200GB swap and gets really slow. It's as if memory was not freed after a sequence was completed.