soedinglab / MMseqs2

MMseqs2: ultra fast and sensitive search and clustering suite
https://mmseqs.com
MIT License
1.47k stars 199 forks source link

result2msa replaces X with a gap in a3m output #497

Closed konstin closed 3 years ago

konstin commented 3 years ago

The colabdb_search.sh script calls mmseqs result2msa --msa-format-mode 6 ("${MMSEQS}" result2msa "${BASE}/qdb" "${DBBASE}/${DB1}.idx" "${BASE}/res_exp_realign_filter" "${BASE}/uniref.a3m" --msa-format-mode 6 --db-load-mode 2 ${FILTER_PARAM} to be exact), which replaces X with - in the query.

Q58725.fasta according to uniprot:

>sp|Q58725|MAP2_METJA Methionine aminopeptidase OS=Methanocaldococcus jannaschii (strain ATCC 43067 / DSM 2661 / JAL-1 / JCM 10045 / NBRC 100440) OX=243232 GN=map PE=3 SV=1
MEIEGYEKIIEAGKIASKVREEAVKLIXPGVKLLEVAEFVENRIRELGGEPAFPCNISIN
EIAAHYTPKLNDNLEFKDDDVVKLDLGAHVDGYIADTAITVDLSNSYKDLVKASEDALYT
VIKEINPPMNIGEMGKIIQEVIESYGYKPISNLSGHVMHRYELHTGISIPNVYERTNQYI
DVGDLVAIEPFATDGFGMVKDGNLGNIYKFLAKRPIRLPQARKLLDVISKNYPYLPFAER
WVLKNESERLALNSLIRASCIYGYPILKERENGIVGQAEHTILITENGVEITTK

a3m file from result2msa (there's a - in the second line):

>Q58725
MEIEGYEKIIEAGKIASKVREEAVKLI-PGVKLLEVAEFVENRIRELGGEPAFPCNISINEIAAHYTPKLNDNLEFKDDDVVKLDLGAHVDGYIADTAITVDLSNSYKDLVKASEDALYTVIKEINPPMNIGEMGKIIQEVIESYGYKPISNLSGHVMHRYELHTGISIPNVYERTNQYIDVGDLVAIEPFATDGFGMVKDGNLGNIYKFLAKRPIRLPQARKLLDVISKNYPYLPFAERWVLKNESERLALNSLIRASCIYGYPILKERENGIVGQAEHTILITENGVEITTK
>Q6GNF9 341 0.334   1.229E-101  0   293 294 159 475 480
MDqaSEEIwTDFRQAAEAHRQVRKYVMSWIKPGMTMIEICEKLEDcsrkLIKENGLYagLAFPTGCSLNNCAAHYTPNAGDPTVLQYDDVCKIDFGTHINGRIIDCAFTVTFNPKYDKLLEAVKDATNTGIRCSGIDVRLCDVGEAIQEVMESYeveidGktyqVKPIRNLNGHSIGPYRIHAGKTVPIVKGGEATRMEEGEVYAIETFGSTGKGVVHDDMECSHYMknFDvGHVPIRLPRAKHLLNVINEKFGTLAFCRRWLDrlGESKYLMALKNLCDLGIVDPYPPLCDMKGSYTAQFEHTILLRPNCKEVVSR
>A0A3B3D0Y5 336 0.331   1.005E-99   0   293 294 180 496 501
MDkaNEEmWNDFRQAAEAHRQVRKHVRSFLKPGMTMIEICERLEDcsrkLIKENGlnAGLAFPTGCSLNHCAAHYTPNAGDTTVLQYDDVCKIDFGTHINGRIIDCAFTVTFNPKYDKLLEAVRDATNTGIKNAGIDVRLCDVGEAIQEVMESYeveldGktyqVKPIRNLNGHSIGQYRIHAGKTVPIVKGGEATRMEEGEVYAIETFGSTGKGVVHDDMECSHYMknFDvGHVPIRLPRAKHLLNVVNENFGTLAFCRRWLDrlGESKYLMALKNLCDLGIVDPYPPLCDTKGCYTAQFEHTILLRPTCKEVVSR

Expected Behavior

X remains X in a3m files

Current Behavior

X is replaced by - in the a3m files.

Steps to Reproduce (for bugs)

Run colabfold_search.sh with Q58725.fasta as input.

MMseqs Output (for bugs)

n/a

Your Environment

Include as many relevant details about the environment you experienced the bug in.

milot-mirdita commented 3 years ago

Should be fixed in https://github.com/soedinglab/MMseqs2/commit/a8c30da56d73cdd7395811496a5dc07fa0c7e23a

konstin commented 3 years ago

That fixes it, thank you!