The new method shall consider the columns as data dimensions (i.e. 3 columns make the data three-dimensional). The user supplies the number of target clusters via nClusters and can select the method (sMethod="k-means"). It should support at least k-means, but might also support others and can be extended in the future. The return value of the method is the assignment of each tuple to the target cluster id.
Additional context
Add any other context or screenshots about the feature request here.
(Do not write below this line)
DEVS' SECTION
ANALYSIS
Instead of having a single method for all possible clustering algorithms, we'll have one method for each algorithm starting with TAB().kmeansof({nCols},nClusters,nMaxIterations). The implementation shall be within memory.cpp with an interface in dataaccess.cpp just like for static std::string tableMethod_anova(const std::string& sTableName, std::string sMethodArguments, const std::string& sResultVectorName). The method static std::string tableMethod_binsof(const std::string& sTableName, std::string sMethodArguments, const std::string& sResultVectorName) show, how to return pure numerical results.
K-Means can only run on numerical data, therefore it is important to not forget to check the column data types first.
DESCRIPTION
What does your feature request improve on? Please describe. Clustering of data is really common and should be supported out-of-the-box.
Describe the solution you'd like Create a table method for clustering data with some options like so:
The new method shall consider the columns as data dimensions (i.e. 3 columns make the data three-dimensional). The user supplies the number of target clusters via
nClusters
and can select the method (sMethod="k-means"
). It should support at least k-means, but might also support others and can be extended in the future. The return value of the method is the assignment of each tuple to the target cluster id.Additional context Add any other context or screenshots about the feature request here.
(Do not write below this line)
DEVS' SECTION
ANALYSIS
Instead of having a single method for all possible clustering algorithms, we'll have one method for each algorithm starting with
TAB().kmeansof({nCols},nClusters,nMaxIterations)
. The implementation shall be withinmemory.cpp
with an interface indataaccess.cpp
just like forstatic std::string tableMethod_anova(const std::string& sTableName, std::string sMethodArguments, const std::string& sResultVectorName)
. The methodstatic std::string tableMethod_binsof(const std::string& sTableName, std::string sMethodArguments, const std::string& sResultVectorName)
show, how to return pure numerical results.K-Means can only run on numerical data, therefore it is important to not forget to check the column data types first.
IMPLEMENTATION STEPS
(see also our Wiki for implementation guidelines)
DOCUMENTATION STEPS
(see also our Wiki for further information)
*.NHLP
and*.NDB
files, if needed)*.NLNG
files, if needed)PULL REQUEST