mlampros / ClusterR

Gaussian mixture models, k-means, mini-batch-kmeans and k-medoids clustering
https://mlampros.github.io/ClusterR/
84 stars 29 forks source link

why does not it change? #35

Closed A-Pai closed 2 years ago

A-Pai commented 2 years ago

why does not it change?

library(ClusterR) set.seed(11) x <- rbind( matrix(rnorm(10000, sd = 0.3), ncol = 2), matrix(rnorm(10000, mean = 1, sd = 0.3), ncol = 2) ) (cm <- Clara_Medoids(x, clusters = 3, samples = 10, sample_size = 0.1, verbose = T)) image

mlampros commented 2 years ago

@A-Pai let me have a look into this

mlampros commented 2 years ago

@A-Pai

there was indeed an issue with the 'seed' parameter (in the Rcpp file). I modified the code so that the seed is adjusted depending on the 'samples' parameter. The following code works as expected,


require(ClusterR)

set.seed(11)
x <- rbind(matrix(rnorm(10000, sd = 0.3), ncol = 2),
           matrix(rnorm(10000, mean = 1, sd = 0.3), ncol = 2))

cm <- Clara_Medoids(x, clusters = 3, samples = 10, sample_size = 0.1, verbose = T, seed = 1)
# str(cm)

cm1 <- Clara_Medoids(x, clusters = 3, samples = 10, sample_size = 0.1, verbose = T, seed = 1)
# str(cm1)

cm2 <- Clara_Medoids(x, clusters = 3, samples = 10, sample_size = 0.1, verbose = T, seed = 2)
# str(cm2)

identical(cm, cm1)
# TRUE
identical(cm, cm2)
# FALSE

nams = names(cm)
# nams

for (item in nams) {
  cat(glue::glue("{item}: {identical(cm[[item]], cm1[[item]])}"), '\n')
}

# call: TRUE 
# medoids: TRUE 
# medoid_indices: TRUE 
# sample_indices: TRUE 
# best_dissimilarity: TRUE 
# clusters: TRUE 
# silhouette_matrix: TRUE 
# fuzzy_probs: TRUE 
# clustering_stats: TRUE 
# dissimilarity_matrix: TRUE 
# distance_metric: TRUE 

for (item in nams) {
  cat(glue::glue("{item}: {identical(cm[[item]], cm2[[item]])}"), '\n')
}

# call: FALSE 
# medoids: FALSE 
# medoid_indices: FALSE 
# sample_indices: FALSE 
# best_dissimilarity: FALSE 
# clusters: FALSE 
# silhouette_matrix: FALSE 
# fuzzy_probs: TRUE 
# clustering_stats: FALSE 
# dissimilarity_matrix: FALSE 
# distance_metric: TRUE 

Let me know if it works for you too once you install the updated version using

remotes::install_github('mlampros/ClusterR', upgrade = 'always', dependencies = TRUE, repos = 'https://cloud.r-project.org/')

Feel free to re-open the issue if the code does not work as expected