Open vsha96 opened 2 years ago
UPD: Current version of kmeans runs only once (only one random initialization). Due to the random nature of the algorithm, we must to run the algorithm using different initializations of centroids and pick the results of the run that that yielded the lower sum of squared distance.
Started working on the issue. Also, would like to propose to add optional max iterations.
-- TODO solve there a problem from numOfClust ???
-- TODO solve there a problem from numOfClust ???
Probably it was related to this: https://github.com/vsha96/mllib/blob/e44b90fae4de81fe7e71fbf722d2e13384e6fd89/src/Mllib/Cluster/KMeans.hs#L152
Maybe we expect that labels are ints from 0 to n, need to test I can't recall, need to deprecate such todos without issue in the issue tracker =)
Suggestion from (I cleaned it and left KNN related changes only)
Add max iter implementation
-- | KMeans parameters for setup
data KMeansParams = KMeansParams
{ rGen :: !StdGen -- ^ Random generator
, clusterNumber :: !Int -- ^ Number of clusters
, maxIter :: Maybe Int -- ^ Maximum number of iterations
} -- TODO: add max iter implementation
See the TODOs in the file and break it into tickets