Open sarahet opened 10 months ago
This is a bit of a common footgun with MMseqs2: https://github.com/soedinglab/MMseqs2/wiki#how-does-mmseqs2-compute-the-sequence-identity
MMseqs2 by default only determines accurate bit-scores and E-values and does not compute the full alignment, instead of just estimates sequence identity from the bit-score.
You have to either pass -a
or --alignment-mode 3
to compute full backtraces and therefore accurate sequence-identiteis and gap-opens etc.
Thanks a lot for your reply. I did not set this option because according to the help message, I thought the --alignment-mode 3
is set by default:
--alignment-mode INT How to compute the alignment:
0: automatic
1: only score and end_pos
2: also start_pos and cov
3: also seq.id
4: only ungapped alignment [3]
When I run the command as I did (without explicitly setting it), I also get the following output:
MMseqs Version: 6f45232ac8daca14e354ae320a4359056ec524c2
Substitution matrix aa:blosum62.out,nucl:nucleotide.out
Add backtrace false
Alignment mode 3
Alignment mode 0
This is the same when I manually set --alignment-mode 3
. Can you maybe help me understand why there are 2 alignment mode parameters in the output that differ (i.e. are there 2 different parameters and I forgot to set one)? I also get the same results and bit scores (such as the example above) when I run the command as I reported above, when I manually set --alignment-mode 3
or when I additionally set -a
. I would appreciate any help!
I think the important part is to set -a
. Me mentioning --alignment-mode
was a bit misleading.
The gap open count is computed based on the presence of a backtrace, which we don't compute by default.
I see, thanks. When I run the command with -a
I get the following verbose output:
MMseqs Version: 6f45232ac8daca14e354ae320a4359056ec524c2
Substitution matrix aa:blosum62.out,nucl:nucleotide.out
Add backtrace true
Alignment mode 3
Alignment mode 0
so the backtrace seems to be enabled. However, I still get exactly the same results and bit scores as before unfortunately. Am I missing another parameter here?
Hi - I ran the easy-search command (newest release, 6f452, self-compiled with cmake 3.25.2 and gcc 12.2.0) on a Linux system with the example query and database file from the MMseqs2 repository and encountered an unexpected behaviour. The BLAST tab output format never includes gap openings (even though many alignments should contain them) and the bit score deviates from what e.g. BLAST reports for the same alignments. To give an example, the pretty html output contains the following alignment for query
C0QTH6
and subjectA0A059IV01
:However, when running the same search but producing a BLAST tab output, the gap is not present (i.e. 0 gap openings):
BLAST produces basically the same alignment (the gap is moved by one but it should create the same score due to the pairing):
Here, the BLAST tab format also includes the gap and it leads to a different bit score compared to MMseqs2.
My exact calls (for the BLAST tab output) are: