Open zhanxw opened 2 years ago
The issue is probably related to file system. I will close for now.
I changed the --db-load-mode
from 2
to 3
, and the performance improves a lot.
Where can I find the documentation on the option `--db-load-mode
? Just want to understand this better.
Here you can read more about MMseqs2: https://github.com/soedinglab/MMseqs2/wiki
I read the wiki and User Guide.
Although there are examples about --db-load-mode 2
, none mentions or explains --db-load-mode 3
.
I think I encountered same question like you, and my HPC node similar with yours, it kept running almost 17h and no progress, I'm wondering that when you set the param --db-load-mode 3
then rerun it, how long could you detect the output?
Any anwser would be helpful! Thanks!
I read the wiki and User Guide. Although there are examples about
--db-load-mode 2
, none mentions or explains--db-load-mode 3
.
This code explains: https://github.com/soedinglab/MMseqs2/blob/87e7103d289029dc3345f85ea9a4c4c6d6416e46/src/prefiltering/PrefilteringIndexReader.cpp#L385
Basically --db-load-mode 3
is the combination of --db-load-mode 2
and vmtouch
, meaning mmseq
will mmap and put the necessary data in the memory.
I think I encountered same question like you, and my HPC node similar with yours, it kept running almost 17h and no progress, I'm wondering that when you set the param
--db-load-mode 3
then rerun it, how long could you detect the output?Any anwser would be helpful! Thanks!
Hard to give a number. --db-load-mode 2
will halt indefinitely. --db-load-mode 3
at least can give results.
Expected Behavior
The analysis finished in minutes on MMSeq2 MSA server using colabfold
Current Behavior
Local mmseqs always paused for hours without generating outputs
Steps to Reproduce (for bugs)
Please make sure to execute the reproduction steps with newly recreated and empty tmp folders. I am using
colab_search
which callsmmseqs
likesearch search_results/qdb db/uniref30_2103_db search_results/res search_results/tmp --num-iterations 3 --db-load-mode 2 -a -s 8 -e 0.1 --max-seqs 10000 --split 8
. The query contains 4 amino acid sequences, and each has the length of 493 amino acid.NOTE, when I took off
--split 8
, I also observed that mmseqs halts at certain stage.MMseqs Output (for bugs)
I had to stop it as mmseqs took hours without progress.
Context
I am quite puzzled what I should do to figure this out. The machine is located on our cluster, so there is enough disk space and memory. I tried to check the process status, and it is always in the
D
status with 100-200% CPU usage ( based onhtop
outputs). Not sure how I can speed things up at this stage.Your Environment
Include as many relevant details about the environment you experienced the bug in.
free -g
)