SUMO generates plots for cophenetic correlation coefficient and the proportion of ambiguously clustered pairs to assist with determining the optimal number of clusters. Additionally, the following metrics can be helpful in certain scenarios and should be generated:
Jaccard index: In some cases, as we go from k clusters to k+1 clusters, a tiny number of samples are assigned to the new cluster. In such a scenario, k+1 clusters may offer little information regarding classification compared to k clusters. If a is the number of pairs of samples that are in the same subgroup for k and the same subgroup for k+1 clusters, and b is the number of pairs of samples that are either in the same group in k and different in k+1 or same group in k+1, but different in k, then you can calculate this index as a / (a+b).
Silhouette score: can be calculated based on H calculated each time, and the final score can be based on those.
Agreement score: How many pairs of samples in each run of the solver get assigned labels that agree with the consensus labels.
SUMO generates plots for cophenetic correlation coefficient and the proportion of ambiguously clustered pairs to assist with determining the optimal number of clusters. Additionally, the following metrics can be helpful in certain scenarios and should be generated:
Jaccard index: In some cases, as we go from
k
clusters tok+1
clusters, a tiny number of samples are assigned to the new cluster. In such a scenario,k+1
clusters may offer little information regarding classification compared tok
clusters. Ifa
is the number of pairs of samples that are in the same subgroup fork
and the same subgroup fork+1
clusters, andb
is the number of pairs of samples that are either in the same group ink
and different ink+1
or same group ink+1
, but different ink
, then you can calculate this index asa / (a+b)
.Silhouette score: can be calculated based on
H
calculated each time, and the final score can be based on those.Agreement score: How many pairs of samples in each run of the solver get assigned labels that agree with the consensus labels.