Open rujpeng opened 3 years ago
Would it be possible to upload the query and database somewhere?
Thanks! The target database was not very large but still very difficult to upload it.
However, I tried to search my query to Uniref30_2020_06 using my query database. It was the same case, the program ended up with segmentation fault.
The query database can be downloaded here: https://doi.org/10.6084/m9.figshare.13540895
I upvote this issue. I (or rather my server) experienced the same problem. I'm computing profiles for standard COG/KOG database (starting for single sequences). My command is: hhblits_omp -i cdd -d /db/hh/UniRef30_2020_06 -oa3m cdd_a3m -n 3 -cpu 120 -v 0 This db has 9696 sequences.
It starts with several gigs of RAM used, but it increases with time and processed entries to use my all 256GB RAM when reaching something around 4000 entries. It seems like memory leak - maybe hhblits_omp doesn't purge data structures from memory after writing the results into resulting ff{data,index} files? When I CTRL+C the computation, make new indexfile with unprocessed entries and continue for them the situation is the same - starting with few GB of RAM used and increasing with entries processed.
My machine has 128 threads (AMD EPYC 7702P), 256GB RAM, runs Debian GNU/Linux 10 (buster) very stably.
Thanks! The target database was not very large but still very difficult to upload it.
However, I tried to search my query to Uniref30_2020_06 using my query database. It was the same case, the program ended up with segmentation fault.
The query database can be downloaded here: https://doi.org/10.6084/m9.figshare.13540895
Junhui, Making over 30'000 searches onto uniref30 using 4 CPUs will take ages. Did you manage to run that search successfully? Kamil
Thanks! The target database was not very large but still very difficult to upload it. However, I tried to search my query to Uniref30_2020_06 using my query database. It was the same case, the program ended up with segmentation fault. The query database can be downloaded here: https://doi.org/10.6084/m9.figshare.13540895
Junhui, Making over 30'000 searches onto uniref30 using 4 CPUs will take ages. Did you manage to run that search successfully? Kamil
Thanks Kamil,
Not yet with hhblits. I used another HMM based method, jackhmmer, instead, although it might be slow but at least runnable. I think maybe hhblits developers are now working on it.
For uniref30, I was using it only in case that people cannot download my database and cannot reproduce my results. My database contains several thousands of sequences and I guess hhblits can do it very fast.
Junhui
I've also noticed issues with hhpred_omp but didn't have time to investigate what's going wrong. As a workaround you can use a script similar to this: https://github.com/soedinglab/hhdatabase_cif70/blob/master/pdb70_hhblits_lock.sh To repeatedly call hhblits based on an input database and to produce an output database again.
I think I fixed one performance issue with a lot of cores in hhblits_omp
here:
https://github.com/soedinglab/hh-suite/commit/e1bd3a124ba9896dfccc6d774bb47fa1ad3ba2f3
Hi, running hhblitp_omp on UniRef30 and seeing the same issue. If running a batch of 100 sequences with 50 threads, the first 30 will complete quickly and it'll slow down progressively with all 128GB RAM being progressively used up and upwards of 200GB swap and gets really slow. It's as if memory was not freed after a sequence was completed.
Dear HHsuite developers and users,
I am using hhblits_omp to search many sequences (above 10,000) against custom database. The searching went well at the beginning. But the memory usage becomes higher and higher and the program gradually ended up with segmentation fault. I am wondering if you have the same situation or that I have not compiled HHsuite properly?
A typical output from my case was like following:
slurm_script: line 8: 123689 Segmentation fault hhblits_omp -cpu 4 -id 100 -maxfilt 30000 -diff 3000 -e 0.01 -cov 15 -qid 15 -i sleb -d ../searchMsa
Best, Junhui