Closed tnn111 closed 1 year ago
You can set the MMSEQS_FORCE_MERGE
environment variable (e.g. export MMSEQS_FORCE_MERGE=1
). The split databases are, however, an IO optimization and not related to memory. Merging after every module invocation can slow MMseqs2 down considerably.
Is there a way of merging them after the run is done? It’s not a big deal; it’s just a little less cluttered.
I really appreciate the software. I’ve been using the taxonomy module extensively with impressive results. Thank you!
On Dec 3, 2022, at 20:39, Milot Mirdita @.***> wrote:
You can set the MMSEQS_FORCE_MERGE environment variable (e.g. export MMSEQS_FORCE_MERGE=1). The split databases are, however, an IO optimization and not related to memory. Merging after every module invocation can slow MMseqs2 down considerably.
— Reply to this email directly, view it on GitHub https://github.com/soedinglab/MMseqs2/issues/644#issuecomment-1336320279, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABMXPRTD22DUCKCKPGRBEPLWLQOAZANCNFSM6AAAAAASRJ3OC4. You are receiving this because you authored the thread.
I'm having the same problem with the linclust command. I get many DB files, perhaps because the original dataset that I am clusterising is huge (16 million SARS sequences). I wonder, if there is a way to merge them post-alignment/linclust?
easy-linclust
will merge the results into easily processable .tsv
files. You should use the linclust
workflow only if you want to process the MMseqs2 internal database formats with other MMseqs2 modules.
Hi,
I'm trying to use the taxonomy feature and when I do, my output DB seems to be split in many smaller DBs. Is there any way to control this split? I'd like to just turn it off. I have 1 TB of memory so I shouldn't have problems.
Other than that, this works great!