Invalid database read error in colabfold_search

Expected Behavior

Hello, I am trying to run batch searches against ColabFoldDB on a SLURM cluster, following the MSA instructions in the README.

Current Behavior

colabfold_search fails at the expandaln step with the error:

Invalid database read for database data file=[db_folder]/uniref30_2103_db.idx, database index=[db_folder]/uniref30_2103_db.idx.index
getData: local id (4294967295) >= db size (22)

Full log file: colabfold_search_output.txt

Steps to Reproduce (for bugs)

bash setup_databases.sh [db_folder] Note: mmseqs createindex was run with --split-memory-limit 128G as mmseqs doesn't detect the SLURM job's memory limit otherwise.
colabfold_search --db-load-mode 0 --mmseqs mmseqs_5185d3c/bin/mmseqs batch_1/input_sequences.fa [db_folder] batch_1/result_s8 Input sequences: input_sequences.fa

It looks like colabfold_search uses --split-memory-limit 0 in the prefilter steps and possibly later steps – I don't think this caused the issue as the job only reached 53 GB usage before it errored, but it would be nice to be able to set this to prevent the job from being killed.

Context

I'm looking to perform a batch search and the cluster jobs have a 250GiB limit, so I'm using --db-load-mode 0, but let me know if that isn't the best option.

Your Environment

Git commit: 2a47c6f1459fbbdb5242cbc62173f9b513813cfa
mmseqs commit: 5185d3cbb7af8a3122e202d47ddaaa785dc73890
Server: Intel Xeon CPU with AVX, 256GiB memory (jobs limited to 250GiB, and lower limits can mean faster submission times)
Operating system and version: CentOS 7

@thomashopf

sokrypton / ColabFold