Closed pol-delta closed 1 year ago
The spheres clustering algorithm sorts the sequences from highest to lowest occurrence. The sequence with highest occurrence is made a centroid and all the sequences within a distance dist
of it are added to the cluster. All the sequences of the cluster are removed from the list and the process is repeated starting from the sequence with highest occurrence. This mode is called "spheres" because the clusters have radius dist
.
The message-passing algorithm sorts the sequences from lowest to highest occurrence. Every sequence is initially labelled as centroid. The sequence with lowest occurrence loses the centroid label and is attached to its closest neighbor if 1) they are within distance dist
and 2) the occurrence of the neighbor is at least r
times higher. In the case that conditions are not met for any neighbor, the sequence simply keeps its centroid label. When there are multiple closest neighbors, the sequence loses the centroid label and is attached to all the closest neighbors. The process is repeated by processing sequences in order of occurrence. In this mode, the clusters can have a radius larger than dist
. It is called "message-passing" because it is inspired from an algorithm with the same name.
The connected-component algorithm merges all sequences that are within a distance dist
of each other. This is independent of the order in which sequences are processed, or of their occurrence. The centroid is the sequence with highest occurrence. This mode is called "connected components" because the clusters are the connected components of a graph where edges exist between sequences within a distance dist
of each other. It was added later than spheres and message-passing cluster.
Thank you, that helps tremendously!
Apologies if I missed this somewhere, but can you please provide a brief description of the "connected components" clustering algorithm and how it differs from message passing and sphere clustering? I couldn't find anything on it in the documentation or published paper.
By the way, Starcode is great, so thank you very much for all your work on it!