tableMethod_kmeansin dataaccess: Parsing parameters and calling implementation of kmeans in memory.cpp. Parameter n_init is already implemented here. This parameter specifies how often kmeans is called with different start seeds, Result with min inertia is returned.
getKMeans() in memory.cpp: Implementation of kmeans itself. The algorithm consists of 3 main steps. For initialziation two versions are available, random seeds or kmeans++. After initialization the two steps assign points to clusters and re-calculate cluster centers are iterated till maxIterations or a Stop criteria is reached.
The helper functions could be usefull for other tasks, or may already be implemented somewhere and I did not find them. Tell me if I should refactor something here.
Also, there are some "todo" Comments, please give me some input on these points.
Implementation test:
I did create a table with 2 float value Columns to test the algorithm.
Since the result of kmeans does depend on the random initialization step the resulting clusters can vary. However with the implementation of n_init resulting clusters are very likely the same.
SUMMARY
Implements necessary changes for #202
Reviewers: @numere-org/maintainers
IMPLEMENTATION
Implementation:
kmeansof()
tableMethod_kmeans
in dataaccess: Parsing parameters and calling implementation of kmeans inmemory.cpp
. Parameter n_init is already implemented here. This parameter specifies how often kmeans is called with different start seeds, Result with min inertia is returned.getKMeans()
inmemory.cpp
: Implementation of kmeans itself. The algorithm consists of 3 main steps. For initialziation two versions are available, random seeds or kmeans++. After initialization the two steps assign points to clusters and re-calculate cluster centers are iterated tillmaxIterations
or a Stop criteria is reached.std::vector<int> getIndices(const std::vector<mu::value_type>& vec, mu::value_type value)
double calculateL2Norm(const std::vector<mu::value_type>& vec1, const std::vector<mu::value_type>& vec2)
std::vector<int> getIndices(const std::vector<mu::value_type>& vec, mu::value_type value)
The helper functions could be usefull for other tasks, or may already be implemented somewhere and I did not find them. Tell me if I should refactor something here.
Also, there are some "todo" Comments, please give me some input on these points.
DOCUMENTATION
TESTS BY REVIEWERS