Closed h836472 closed 4 years ago
MMseqs2 approximates the sequence identity by default (https://github.com/soedinglab/MMseqs2/wiki#how-does-mmseqs2-compute-the-sequence-identity). You'll have to pass the -a
or --alignment-mode 3
parameter to search to compute the full alignments instead of only the faster computable alignment scores.
Thank you for the prompt answer!
Indeed, adding -a and --alignment-mode 3 switches resolve the issue.
Thank you. Balazs
Expected Behavior
MMSeqs search followed by MMSeqs convertalis --format-output "query,target,pident,nident" should export the number of identical matches between query and target sequences
Current Behavior
MMSeqs always reports the "nident" (number of identical residues) value to be 0.
Steps to Reproduce (for bugs)
Please run bash script below to reproduce error
!/bin/bash
download protein sequences from Pyrococcus furiosus
wget -c https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/008/245/085/GCF_008245085.1_ASM824508v1/GCF_008245085.1_ASM824508v1_protein.faa.gz
uncompress protein sequence
gunzip GCF_008245085.1_ASM824508v1_protein.faa.gz
create MMSeqs database
mmseqs createdb GCF_008245085.1_ASM824508v1_protein.faa GCF_008245085.1 >createdb.log
perform all_vs_all search on proteins of the genome
mmseqs search GCF_008245085.1 GCF_008245085.1 GCF_008245085.1.selfDB /tmp >search.log
export results to a custom text file Q H pident nident
mmseqs convertalis GCF_008245085.1 GCF_008245085.1 GCF_008245085.1.selfDB GCF_008245085.1.self.txt --format-output "query,target,pident,nident" >convertalis.log
check output file
head GCF_008245085.1.self.txt
Please make sure to execute the reproduction steps with newly recreated and empty tmp folders.
MMseqs Output (for bugs)
Please make sure to also post the complete output of MMseqs. You can use gist.github.com for large output.
MMSeqs log files are available upon request.
Context
Providing context helps us come up with a solution and improve our documentation for the future.
Your Environment
Include as many relevant details about the environment you experienced the bug in.
Git commit used (The string after "MMseqs Version:" when you execute MMseqs without any parameters):
Which MMseqs version was used (Statically-compiled, self-compiled, Homebrew, etc.): MD5sum for MMseqs2-master.zip: 1fe18027245969de6cea579b5f31a0df (Latest version downloaded from GitHub on 8th Sept 2020)
For self-compiled and Homebrew: Compiler and Cmake sse4_2versions used and their invocation: gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.12) cmake version 3.5.1 commands to compile mkdir build cd build/ cmake -DCMAKE_BUILD_TYPE=RELEASE -DCMAKE_INSTALL_PREFIX=/home/balintb .. make install
Server specifications (especially CPU support for AVX2/SSE and amount of system memory): Compiled and tested on a Lenovo T430 with 16 GB RAM i5-3320M CPU with sse3, sse4_1, sse4_2 and avx supported commands to compile:
Operating system and version: Ubuntu 16.04.5 LTS