Lackluster clustering performance

lmcinnes / umap

Uniform Manifold Approximation and Projection

BSD 3-Clause "New" or "Revised" License

7.39k stars 803 forks source link

I think the short answer is that, unfortunately, this approach isn't magic and in this case it seems like it can't match the labelling. It should also be noted that clusters matching labels is not always a guaranteed thing. Now passing through a resnet to generate features should mean that cluster structure at least somewhat resembles the class/label structure, but it seems that's not the case: none of the techniques are doing that well. I would suggest looking at a visualization of a 2D or 3D UMAP coloured by the labels and see what sort of correlation there actually is between qualitative clusters and labels. That may give you some ideas about how things could be improved.

lmcinnes / umap

Lackluster clustering performance #706