Table method for clustering data

DESCRIPTION

What does your feature request improve on? Please describe. Clustering of data is really common and should be supported out-of-the-box.

Describe the solution you'd like Create a table method for clustering data with some options like so:

TAB().clustersof({nCols},nClusters,sMethod="k-means") -> {VAL}

The new method shall consider the columns as data dimensions (i.e. 3 columns make the data three-dimensional). The user supplies the number of target clusters via nClusters and can select the method (sMethod="k-means"). It should support at least k-means, but might also support others and can be extended in the future. The return value of the method is the assignment of each tuple to the target cluster id.

Additional context Add any other context or screenshots about the feature request here.

(Do not write below this line)

DEVS' SECTION

ANALYSIS

Instead of having a single method for all possible clustering algorithms, we'll have one method for each algorithm starting with TAB().kmeansof({nCols},nClusters,nMaxIterations). The implementation shall be within memory.cpp with an interface in dataaccess.cpp just like for static std::string tableMethod_anova(const std::string& sTableName, std::string sMethodArguments, const std::string& sResultVectorName). The method static std::string tableMethod_binsof(const std::string& sTableName, std::string sMethodArguments, const std::string& sResultVectorName) show, how to return pure numerical results.

K-Means can only run on numerical data, therefore it is important to not forget to check the column data types first.

IMPLEMENTATION STEPS

(see also our Wiki for implementation guidelines)

Implement the necessary changes in a new branch created here on GitHub
Test your implementation

DOCUMENTATION STEPS

(see also our Wiki for further information)

Update the changes log
Add comments to your implementation
Add Doxygen documentation comments
Create or update the documentation articles (*.NHLP and *.NDB files, if needed)
Update the language strings (*.NLNG files, if needed)

PULL REQUEST

Create a pull request for your changes
Fill out the template
Assign @numere-org/maintainers as reviewers

numere-org / NumeRe