rluedde / ml_algorithms

implementations of k-means clustering and linear regression from scratch
0 stars 1 forks source link

Don't start with random points #2

Open rluedde opened 4 years ago

rluedde commented 4 years ago

The points that clusters are built off of shouldn't be random. The randomness causes the clusters to be names unconstantly.

For example, on one run of the model (with 5 data pts), you might get a cluster ids of: [1, 0, 0, 1, 2] but on another run, the names would be different but that pattern might be the same: [0, 1, 1, 0, 2]

I think that starting with points that are non-random and far from each other is a more expensive but better solution.

rluedde commented 4 years ago

Another problem that randomness causes is on some runs of .classify(), you might get slightly different classifications.