Closed A-Pai closed 2 years ago
@A-Pai, that's true, you are right. The only difference is in the call (the output call differs because the one has seed=1 and the other seed=2).
require(ClusterR)
require(glue)
data(dietary_survey_IBS)
dat = dietary_survey_IBS[, -ncol(dietary_survey_IBS)]
dat = center_scale(dat)
cm = Cluster_Medoids(dat, clusters = 3, distance_metric = 'euclidean', swap_phase = FALSE, seed = 1)
cm2 = Cluster_Medoids(dat, clusters = 3, distance_metric = 'euclidean', swap_phase = FALSE, seed = 2)
if (!all(names(cm) == names(cm2))) stop("The sublist names differ!")
nams = names(cm)
nams
for (item in nams) {
cat(glue::glue("{item}: {identical(cm[[item]], cm2[[item]])}"), '\n')
}
# call: FALSE
# medoids: TRUE
# medoid_indices: TRUE
# best_dissimilarity: TRUE
# dissimilarity_matrix: TRUE
# clusters: TRUE
# silhouette_matrix: TRUE
# fuzzy_probs: TRUE
# clustering_stats: TRUE
# distance_metric: TRUE
print(cm$call)
# Cluster_Medoids(data = dat, clusters = 3, distance_metric = "euclidean",
# swap_phase = FALSE, seed = 1)
The cluster-medoids differs from the kmeans algorithm because it doesn't have any initialization of the centroids (random etc.), and the medoids are picked based on the dissimilarity matrix which means the medoids are based on the selected distance-method (euclidean etc.) and this does not change from one run to another.
Give me a few days to add a deprecation warning for the "seed" parameter. Thank you for making me aware of this issue.
I added a deprecation warning to the function, related to the 'seed' parameter and I'll remove this parameter in version 1.3.0
You can download the latest version using
remotes::install_github('mlampros/ClusterR', upgrade = 'always', dependencies = TRUE, repos = 'https://cloud.r-project.org/')
Feel free to re-open the issue if the code does not work as expected
`library(ClusterR)
data(dietary_survey_IBS) dat <- dietary_survey_IBS[, -ncol(dietary_survey_IBS)] dat <- center_scale(dat) cm <- Cluster_Medoids(dat, clusters = 3, distance_metric = "euclidean", swap_phase = TRUE, seed = 1) cm1 <- Cluster_Medoids(dat, clusters = 3, distance_metric = "euclidean", swap_phase = TRUE, seed = 1) cm2 <- Cluster_Medoids(dat, clusters = 3, distance_metric = "euclidean", swap_phase = TRUE, seed = 2)
identical(cm, cm1) identical(cm$call,cm1$call) identical(cm$medoids, cm1$medoids) identical(cm$medoid_indices, cm1$medoid_indices) identical(cm$best_dissimilarity, cm1$best_dissimilarity) identical(cm$dissimilarity_matrix, cm1$dissimilarity_matrix) identical(cm$clusters, cm1$clusters) identical(cm$silhouette_matrix, cm1$silhouette_matrix) identical(cm$fuzzy_probs, cm1$fuzzy_probs) identical(cm$clustering_stats, cm1$clustering_stats) identical(cm$distance_metric, cm1$distance_metric)
identical(cm,cm2) identical(cm$call,cm2$call) identical(cm$medoids, cm2$medoids) identical(cm$medoid_indices, cm2$medoid_indices) identical(cm$best_dissimilarity, cm2$best_dissimilarity) identical(cm$dissimilarity_matrix, cm2$dissimilarity_matrix) identical(cm$clusters, cm2$clusters) identical(cm$silhouette_matrix, cm2$silhouette_matrix) identical(cm$fuzzy_probs, cm2$fuzzy_probs) identical(cm$clustering_stats, cm2$clustering_stats) identical(cm$distance_metric, cm2$distance_metric)`
you will get:
you can see :“cm” is not identical to “cm2” just because “cm$call” is not identical to “cm2$call”,it is only calling expression different.