wearelumenai / distclus

Distance based clustering
MIT License
3 stars 2 forks source link

Handle error when MaxK is greater than distinct data values #2

Closed b3j0f closed 5 years ago

ydarma commented 5 years ago

The program would panic in the mcmc loop when trying to draw a new center if all the observation are exactly equal to one centroid of the current configuration. All observation to centroid minimal distance would be zero and the kmeans++ iteration would try to draw over the null set.

ydarma commented 5 years ago

An error is returned when all weight are zero in the kmeans++ draw. This error bubbles up to the mcmc iteration which leave the centroids number unchanged when this happens.