Closed LanderDC closed 3 months ago
Hi,
Thanks for reaching out! While the aniclust.py
documentation mentions UCLUST-like clustering, it actually performs CD-HIT clustering. In MMseqs2, this corresponds to --cluster-mode 2
, which you are using. It would be interesting to compare the tools using the same clustering algorithm (in Vclust, that's --algorithm cd-hit
).
Regarding permuted circular genomes, Vclust should work fine. Like BLAST, it identifies local alignments between two genomes (similar to HSPs in BLAST) and calculates ANI from those local alignments. In the worst case, ANI might be slightly underestimated due to short alignment discontinuities at the breakpoints of circular genomes.
Andrzej
Thanks! You are right, Vclust
with the CD-HIT algorithm is much more similar to aniclust.py
:
Hi,
I'm trying out your tool because I want to replace the workflow in our lab that currently uses megaBLAST + CheckV's
anicalc
/aniclust
to something faster. Based on your tweet,Vclust
would be perfect for this as it can use the same clustering algorithms asanicalc
/aniclust
.However, when I compare the
uclust
,leiden
andmmseqs linclust
clusterings to the originalanicalc
/aniclust
, it seems that the latter performs better (in terms of more similar to the original clustering) based on the adjusted rand index, which is the opposite of what I expected. Do you have any idea why that might be?In addition, how does
Vclust
handle permuted circular genomes?Thanks in advance!
The commands I used: