mlampros / ClusterR

Gaussian mixture models, k-means, mini-batch-kmeans and k-medoids clustering
https://mlampros.github.io/ClusterR/
84 stars 29 forks source link

The solution of package ClusterR 1.3.0 is not best yet #47

Closed A-Pai closed 1 year ago

A-Pai commented 1 year ago

` n <- 100 # data size set.seed(3) x <- rbind( matrix(rnorm(n, sd = 0.25), ncol = 2), matrix(rnorm(n, mean = 1, sd = 0.25), ncol = 2) )

library(ClusterR) k <- 2 cm <- Cluster_Medoids(x, k, distance_metric = "euclidean")

print(packageVersion("ClusterR")) cat("package ClusterR solutuion") cat("\n") cat("medoid_indices:", sort(cm$medoid_indices)) cat("\n") cat("best_dissimilarity:", cm$best_dissimilarity)

library(cluster) k <- 2 pm <- pam(x, k, metric = "euclidean")

cat("\n\n") cat("package cluster solutuion") cat("\n") cat("medoid_indices:", sort(pm$id.med)) cat("\n") cat("best_dissimilarity:", n * pm$objective[2]) `

image

A-Pai commented 1 year ago

[1] ‘1.3.0’ package ClusterR solutuion medoid_indices: 25 57 best_dissimilarity: 31.97467

package cluster solutuion medoid_indices: 20 57 best_dissimilarity: 31.86491

mlampros commented 1 year ago

@A-Pai, I'm sorry for the late reply. I included details regarding the differences in the previous issue that you opened. I also tested the current implementation of the ClusterR::Cluster_Medoids() function on many datasets and I also added other existing algorithms (R packages). You can read more in this blog-post (towards the end there are also bar-plots with the differences that appear between the various algorithms for all the datasets)

I'll close the issue. Feel free to re-open in case the code does not work as expected

A-Pai commented 1 year ago

I verified the calculation results in matlab,the solution of package ClusterR 1.3.0 is not best yet: R code: ` n <- 100 set.seed(3) x <- rbind( matrix(rnorm(n, sd = 0.25), ncol = 2), matrix(rnorm(n, mean = 1, sd = 0.25), ncol = 2) )

write.csv(x, "x2.csv", row.names = FALSE)

library(ClusterR) packageVersion("ClusterR")

k <- 2 cm <- Cluster_Medoids(x, k, distance_metric = "euclidean")

cat("package ClusterR solutuion") cat("\n") cat("medoid_indices:", sort(cm$medoid_indices)) cat("\n") cat("best_dissimilarity:", cm$best_dissimilarity) ` image

matlab code: `x = readmatrix("x2.csv"); [idx,C,sumd,d,midx,info] = kmedoids(x,2,'Distance','euclidean'); sum(sumd) display(midx); image