The current design chooses the first k points as starting values.
If any of these data points are identical this leads the first to be assigned all the points and the second to be assigned no points (and then generating a NaN mean over its 0 members, and derailing the whole clustering algorithm).
There are 2 solutions I can think of to avoid this condition:
Select the first k distinct points for centres.
Move any centre which ends up with a cluster of size 0 to a random other point.
The first one seems simple and more predictably performant to start from.
The current design chooses the first k points as starting values.
If any of these data points are identical this leads the first to be assigned all the points and the second to be assigned no points (and then generating a NaN mean over its 0 members, and derailing the whole clustering algorithm).
There are 2 solutions I can think of to avoid this condition:
The first one seems simple and more predictably performant to start from.