rinuboney / clatern

Machine Learning in Clojure
Eclipse Public License 1.0
67 stars 12 forks source link

Add support to Kmeans for handling empty clusters #3

Closed rnowling closed 9 years ago

rnowling commented 9 years ago

In the scikit-learn implementation of K-means, data points are re-assigned to empty clusters so that there are never empty clusters. Re-assigned points are chosen by taking the points with the longest distances to their assigned cluster centers.

This issue was discovered when writing the K-Means unit tests in pull request #1 .

rnowling commented 9 years ago

This bug is blocking work on adding new distance metrics since centroids can be set equal to an empty vector ([]). The distance function in core.matrix ignores differences in vector shapes (which is arguably bad behavior) while other functions may not.