Closed rnowling closed 9 years ago
This bug is blocking work on adding new distance metrics since centroids can be set equal to an empty vector ([]
). The distance
function in core.matrix
ignores differences in vector shapes (which is arguably bad behavior) while other functions may not.
In the scikit-learn implementation of K-means, data points are re-assigned to empty clusters so that there are never empty clusters. Re-assigned points are chosen by taking the points with the longest distances to their assigned cluster centers.
This issue was discovered when writing the K-Means unit tests in pull request #1 .