soedinglab / MMseqs2

MMseqs2: ultra fast and sensitive search and clustering suite
https://mmseqs.com
GNU General Public License v3.0
1.37k stars 191 forks source link

result2msa output change #846

Open Beigephage opened 4 months ago

Beigephage commented 4 months ago

Running result2msa as described in the docs

mmseqs result2msa DB DB DB_clu DB_clu_msa

using older versions of mmseqs2 (in e.g 2019) previously merged results to give one 'DB_clu_msa' file output with an msa. Newer versions of mmseqs2 (13.45111) running the same command produces multiple '.ffdata' files, namely 'DB_clu_msa_sequence.ffdata', 'DB_clu_msa_header.ffdata', 'DB_clu_msa_ca3m.ffdata' with no msa and no file similar to the old 'DB_clu_msa' file.

How can one retrieve the merged 'DB_clu_msa' file in the old format?

Thank you

milot-mirdita commented 4 months ago

You are somehow passing --msa-format-mode 0 to result2msa. Please don't use mode 0.

Beigephage commented 4 months ago

Unfortunately I had not added a --msa-format-mode flag at all, only mmseqs result2msa DB DB DB_clu DB_clu_msa. I just tried to run the command as mmseqs result2msa DB DB DB_clu DB_clu_msa --msa-format-mode 1 and I still got the same 3 files, not 1 consolidated msa. --msa-format-mode 2 gave a file closer to what I was looking for, but some sequence headers seemed missing and upon inspection, those lines begin with ^@ when viewed via terminal. mode 4 gave a stockholm format. Is there a different option that provides the standard alignment?

Beigephage commented 4 months ago

trying to remove any hidden characters from the msa may not be solving the problem as it still cannot be used with reformat.pl or addss.pl (no secondary structure prediction after addss.pl)

milot-mirdita commented 4 months ago

I am still confused with what's going on. I can't reproduce the issue locally, everything works as expected.

Please provide the full command line calls, terminal output and directory listing of the folder with the result files. Maybe that will help me understand what's going on.