huonw / cogset

Generic implementations of clustering algorithms.
http://huonw.github.io/cogset/cogset
Apache License 2.0
20 stars 5 forks source link

Zero assigned clusters leading to zero means #5

Open queenp opened 7 years ago

queenp commented 7 years ago

The current design chooses the first k points as starting values.

If any of these data points are identical this leads the first to be assigned all the points and the second to be assigned no points (and then generating a NaN mean over its 0 members, and derailing the whole clustering algorithm).

There are 2 solutions I can think of to avoid this condition:

The first one seems simple and more predictably performant to start from.

Stunkymonkey commented 3 years ago

today the same problem appeared today. i have not tested #6 yet, but @huonw please look into it.