Open ucabuk opened 1 year ago
I don't think that there is a lot left to speed up NR searches. The NR is just extremely large.
We were thinking of implementing clustered searches, similar to our ColabFold search, as a more general search-strategy in MMseqs2. But that's a longer term project. These would speed up searches against the NR significantly.
The memory use is not very accurate and it also doesn't take database chunking into account. If you use a machine with less RAM, then it will just split the target database in smaller chunk (at a small runtime cost).
Thank you for your answer. I understand, yes, I agree would be good to see clustered searches in MMseqs2. Is there any benchmark with diamond tool? Maybe I could not see it.
Best, Ugur
Hi,
I am using mmseqs2 for the taxonomy assignment using NR database. However, Estimated memory consumption is 2T. Is that normal? Also, my input is already protein. My another question is about the speed. Is there any way to speed it up?
Thank you. Best,