vsha96 / mllib

Machine Learning in Haskell
BSD 3-Clause "New" or "Revised" License
29 stars 1 forks source link

Improve the model src/Mllib/Cluster/KMeans.hs #6

Open vsha96 opened 2 years ago

vsha96 commented 2 years ago

See the TODOs in the file and break it into tickets

vsha96 commented 2 years ago

UPD: Current version of kmeans runs only once (only one random initialization). Due to the random nature of the algorithm, we must to run the algorithm using different initializations of centroids and pick the results of the run that that yielded the lower sum of squared distance.

s1m0000n commented 1 year ago

Started working on the issue. Also, would like to propose to add optional max iterations.

s1m0000n commented 7 months ago

-- TODO solve there a problem from numOfClust ???

vsha96 commented 7 months ago

-- TODO solve there a problem from numOfClust ???

Probably it was related to this: https://github.com/vsha96/mllib/blob/e44b90fae4de81fe7e71fbf722d2e13384e6fd89/src/Mllib/Cluster/KMeans.hs#L152

Maybe we expect that labels are ints from 0 to n, need to test I can't recall, need to deprecate such todos without issue in the issue tracker =)

vsha96 commented 7 months ago

Suggestion from (I cleaned it and left KNN related changes only)

Add max iter implementation

-- | KMeans parameters for setup
data KMeansParams = KMeansParams
    { rGen            :: !StdGen    -- ^ Random generator
    , clusterNumber   :: !Int       -- ^ Number of clusters
    , maxIter         :: Maybe Int  -- ^ Maximum number of iterations
    } -- TODO: add max iter implementation