Closed dm-kuba closed 1 year ago
Could you please explain your use case so that we can recommend a solution?
Hi Martin,
We would like to copy the whole database into RAM once and run multiple queries against it. However, all the database files are too large to fully fit in memory and mmap
-ing is not an option.
So, ideally, we would like to search against a compressed version of the database; is that possible?
I'm aware of the --compressed
flag for createdb
, but that still leaves us with the same really large .idx
files that take up most of the space. Is there anything we're missing on the compression side?
Thanks, Kuba
The index can not be shrunk, if you want to allow for real-time searches. Depending on the size of your database, you could implement the same clustered MMseqs2 search workflow as implemented in ColabFold. This will reduce memory requirements massively. We plan to eventually over this workflow directly in MMseqs2.
Thank you!
Hi,
When I run
createdb; createindex
on a fasta DB file, I generally observe the end result (all the generated output files together) is roughly ~10x bigger than the input fasta file. Most of it is the .idx files generated bycreateindex
.The only way I got mmseqs to run fast is by using
db_load_mode=2
, getting the entire target DB in memory at the same time.Running
mmseqs search
efficiently against a large DB thus presents really large memory requirements. Is there any way around it (either currently, or planned)? E.g. searching against a compressed version of the DB?Thank you! Kuba