Open knuser opened 2 years ago
Update, I had just found 13aa length example, which is causing segfault: TDPPIHIASLXRS
Observation: after changing X
to, for example G
(TDPPIHIASLGRS
), MMseqs2 will process example correctly
EDIT, another segfault example, this time much longer: DPLVFFKXXFXXGGGGGAGCGGGGMKRT
, (observation, extended version will be processed correctly: DPLVFFKXXFXXGGGGGAGCGGGGMKRTRRALPAN
)
Expected Behavior
Don't crash on envdb when sequence length is less than 12 aa long (for example on
SEGGQDFWL
orGSSGLISMPRV
).Current Behavior
MMseqs2 process crashes on aligning ColabFold envdb every time if input
.fasta
file contains short sequence (this also happens if .fasta file contains more than one sequence). UniRef database is processed every time without issue, crash happens only on envdb processing.Steps to Reproduce (for bugs)
Put in
input_sequences.fasta
anywhere (it affects single entry fasta and also miltientry fasta) one of those examples:SEGGQDFWL
GSSGLISMPRV
Setup ColabFold databases from https://github.com/sokrypton/ColabFold/blob/main/setup_databases.sh Run
colabfold_search input_sequences.fasta /path/to/db_folder search_results
you will see above crashOR
Go to https://colab.research.google.com/github/sokrypton/ColabFold/blob/main/AlphaFold2.ipynb and try to fold one of the examples, you will see:
MMseqs Output (for bugs)
Please make sure to also post the complete output of MMseqs. You can use gist.github.com for large output.
Context
If you will extend crashing examples to 12aa then mmseqs will work correctly. Is seems that 12 is some kind of magic barrier in examples I found.
Your Environment
Include as many relevant details about the environment you experienced the bug in.
fcf52600801a73e95fd74068e1bb1afb437d719d
andedb8223d1ea07385ffe63d4f103af0eb12b2058e
fcf52600801a73e95fd74068e1bb1afb437d719d
compiled from sourceedb8223d1ea07385ffe63d4f103af0eb12b2058e
downloaded fromhttps://mmseqs.com/archive/edb8223d1ea07385ffe63d4f103af0eb12b2058e/mmseqs-linux-avx2.tar.gz