Open clami66 opened 13 hours ago
Still working on it, we'll likely release the changes to do ColabFold with MMseqs2-GPU this weekend. colabfold_search
doesn't actually require any changes directly. The new protocol can be activated with environment variables only, after building GPU databases.
Thanks for responding so quickly, I will keep an eye out for the updates
I am trying to integrate the new GPU-accelerated search in colabfold_search. From what I can see, only
search
andeasy-search
are GPU-accelerated. However, thecolabfold_search
alignment protocol also includes aexpandaln
step (among others).Unfortunately, it seems like
expandaln
is incompatible with the padded sequence DB generated and indexed for GPU, as runningmmseqs expandaln
on this database will cause it to crash. I think this is because the database.idx.index
file lacks rows 24-25, i.e.ALNINDEX
,ALNDATA
as defined here: https://github.com/soedinglab/MMseqs2/blob/266c894c117a9bd650450974747424ce51124bf5/src/prefiltering/PrefilteringIndexReader.cpp#L33C1-L34C52I thought that this was due to using the
--index-subset 2
flag when runningmmseqs createindex
as recommended in the guide, but even using--index-subset 0
doesn't fix the issue for me.Now I am wondering if the whole alignment protocol should change (e.g. by removing
expandaln
altogether) or perhaps there is something I am doing incorrectly when setting the database up? Thanks for any help on this!Steps to Reproduce (for bugs)
Generate the padded DB:
mmseqs makepaddedseqdb uniref30_2302_db uniref30_2302_db_gpu
Generate the index (either with
--index-subset 0
or--index-subset 2
)$ tail uniref30_2302_db_gpu.idx.index ... 21 10770190336 105711065 22 20480 41 23 16384 1
mmseqs expandaln ./example/qdb colabfold_databases/uniref30_2302_db_gpu.idx ./example/res colabfold_databases/uniref30_2302_db_gpu.idx ./res_exp
MMseqs Version: dc7395810db17ec7de8adf32599562452b0c4d78 Expansion mode 0 Substitution matrix aa:blosum62.out,nucl:nucleotide.out Gap open cost aa:11,nucl:5 Gap extension cost aa:1,nucl:2 Max sequence length 65535 Score bias 0 Compositional bias 1 Compositional bias 1 E-value threshold 0.001 Seq. id. threshold 0 Coverage threshold 0 Coverage mode 0 Pseudo count mode 0 Pseudo count a substitution:1.100,context:1.400 Pseudo count b substitution:4.100,context:5.800 Expand filter clusters 0 Use filter only at N seqs 0 Maximum seq. id. threshold 0.9 Minimum seq. id. 0.0 Minimum score per column -20 Minimum coverage 0 Select N most diverse seqs 1000 Preload mode 0 Compressed 0 Threads 128 Verbosity 3
Index version: 16 Generated by: dc7395810db17ec7de8adf32599562452b0c4d78 ScoreMatrix: VTML80.out Index version: 16 Generated by: dc7395810db17ec7de8adf32599562452b0c4d78 ScoreMatrix: VTML80.out Invalid database read for database data file=colabfold_databases/uniref30_2302_db_gpu.idx, database index=colabfold_databases/uniref30_2302_db_gpu.idx.index getData: local id (4294967295) >= db size (22)