Closed jchiquet closed 3 years ago
Hi @jchiquet and thanks for reporting this error. It seems it's related with the initializer. I guess the stats::kmeans() picks the initial centroids randomly,
mt = matrix(c(1,-1,-1,-1,-1,1,1,1), 4, 2)
k = 2
# base R
clust1 = stats::kmeans(x = mt, centers = k)
clust1
# RcppArmadillo
seed_mode = c('static_subset', 'random_subset', 'static_spread', 'random_spread')
clust2 = lapply(1:length(seed_mode), function(x) {
ClusterR::KMeans_arma(data = mt, clusters = k, seed_mode = seed_mode[x])
})
# Rcpp
inits = c('optimal_init', 'random')
clust2 = lapply(1:length(inits), function(x) {
ClusterR::KMeans_rcpp(data = mt, clusters = 2, initializer = inits[x])
})
I receive an error when the initializer of the ClusterR::KMeans_rcpp() function is set to either 'kmeans++' (default method) or to 'quantile_init' (experimental)
In my opinion the observations of your dataset are quite few for the 'kmeans++' initializer to work. You can have a look to the Rcpp code here
On the other hand the 'quantile_init' initializer does not work (I guess) for the same reason (few observations) because it has to compute the quantiles first to come to potential centroids.
Can you use one of the other initializers that work to your data ('optimal_init', 'random')?
Indeed, I use the kmeans in a split-and-merge strategy to avoid local minima in a more general model-based clustering method. Sometimes, kmeans is run on 'extreme' situations just like this one. I shall add some additional tests on my side and/or change the initializer.
Anyway, many thanks for the explanation and the follow-up.
This simple example fails with the Rcpp version of the kmeans algorithm found in ClusterR
with the following error
Note that
base::kmeans
andClusterR::KMeans_arma
both work.OS: Ubuntu 20.04, R 4.0.5, ClusterR 1.2.4