lucianolorenti / SpectralClustering.jl

Spectral clustering algorithms written in Julia
Other
49 stars 9 forks source link

Eigen vector Clustering for Spectral Clustering is inconsistent (and possibly wrong) #10

Closed thazhemadam closed 3 years ago

thazhemadam commented 3 years ago

The example provided for Eigenvector Clustering also provides a plot.

However, upon running and re-running this example locally, the results seem to be inconsistent every time, and more importantly, wrong.

Please find below 3 such examples: image image image

Shouldn't the Eigenvector clustering in all 3 cases be such that one of the circles is identified as one cluster, and the other circle is identified as another cluster?

Furthermore, isn't the plot obtained in the example also wrong (at least for the K-Means Clustering option). Shouldn't the K-Means clustering look like something in the below image*?

image

*Image obtained from a Python implementation of Spectral Clustering which can be found here

thazhemadam commented 3 years ago

@lucianolorenti sorry for the ping, but could you please take a look at this, and see if I'm going wrong somewhere?

lucianolorenti commented 3 years ago

Hi! I've been working a little bit on the devel branch. I think this problem is now solved. I changed the tolerance of the eigensolver and modify how the eigenvectors corresponding to the same eigenvalues are handled

Regarding the plots, keep in mind that in sk-learn kmeans is being applied to the raw data points, while on the example of this library both clustering methods are applied to the embedding. The idea is to show two different methods to discretize the eigenvector embedding.

Before merging to master I want to create a proper test for #9 I thought I did it but, the test pass on the github CI ( and it shouldn't ) but not in my pc and I want to know what is going on.

thazhemadam commented 3 years ago

Regarding the plots, keep in mind that in sk-learn kmeans is being applied to the raw data points, while on the example of this library both clustering methods are applied to the embedding. The idea is to show two different methods to discretize the eigenvector embedding.

I understand that. Kindly correct me if I'm wrong, but regardless of being raw-data points or embedding, the way k-Means is applied must comply with the general nature of the k-Means algorithm itself, right? That is to say, if I were to apply k-means clustering to a system resembling the one provided in the example, wouldn't I still get two clusters, each on one different halves of the embedding (essentially similar to the image obtained from the python implementation)?

thazhemadam commented 3 years ago

Bump. @lucianolorenti has this been issue been solved yet (sorry for the ping again 😅 )? If not, is there anything I can do to try and get this working properly? I need to use SpectralClustering for a few graphs that I have, but don't want to just reinvent the wheel 😅

lucianolorenti commented 3 years ago

Hi! I merged the changes I did the other time. I think now it's working. I added an example of clustering using MNIST and apparently, the embeddings and the clusterization are working ok

Unfortunately, I am not longer have the time nor the interest to keep updating the library so I will archive the project.