Open bbuchfink opened 3 years ago
Hi Benjamin, we currently have no option to turn of Linclust or to set user defined sensitivity levels for each step. Having flags for both might be useful. Could you please explain your use case a bit more? Maybe we have already some mechanism that might solve some of the issues.
Hi Martin, my use case is a bachelor student who wants to compare clustering with Diamond and MMSeqs2. We already did runs with Linclust being enabled, but since Diamond unfortunately does not have a Linclust-like feature, we also want to run a comparsion with Linclust disabled (and sensitivity levels that match those of Diamond).
You could try to compare it with the single step clustering --single-step-clustering
. But the regular Linclust + cascaded clustering workflow is much faster.
For benchmarking you could do this two things:
(1) Just hardcode your sensitivities levels in src/workflows/Cluster.cpp
line 195 for now.
(2) Remove the linclust call in data/cascaded_clustering.sh
. But we might add this feature the next days.
Thanks, I tried to hack your script and it looks like it's working. Let me know in case you add the feature.
I'll second the idea that being able to scan identity levels is useful. Log steps in (1-identity) is generally the right step spacing. Log-log plots of the deltas in cluster sizes make a very informative plot with peaks at any genome duplication events.
I would like to be able to run cascaded clustering with explicitly defined sensitivity levels, and also disable the first Linclust step. Can this be done somehow?