Closed EngrStudent closed 4 years ago
Hi @EngrStudent,
the following example returns the 'centroids', 'covariance_matrices' and 'weights' in the same order,
data(dietary_survey_IBS, package = 'ClusterR')
dat = as.matrix(dietary_survey_IBS[, -ncol(dietary_survey_IBS)])
dat = ClusterR::center_scale(dat)
seed_ibs = 1
for (dist_meth in c('eucl_dist', 'maha_dist')) {
gmm_ibs = list()
for (i in 1:20) {
gmm_ibs[[i]] = ClusterR::GMM(data = dat,
gaussian_comps = 2,
dist_mode = dist_meth,
seed_mode = "random_subset",
km_iter = 10,
em_iter = 10,
seed = seed_ibs)
}
cent_ibs = lapply(gmm_ibs, function(x) x$centroids)
cov_ibs = lapply(gmm_ibs, function(x) x$covariance_matrices)
weigh_ibs = lapply(gmm_ibs, function(x) x$weights)
cat("are all centroids equal for ", dist_meth, " method: ", all(unlist(lapply(cent_ibs[-1], function(y) all(unlist(cent_ibs[[1]] == y))))), '\n')
cat("are all covariance matrices equal for ", dist_meth, " method: ", all(unlist(lapply(cov_ibs[-1], function(y) all(unlist(cov_ibs[[1]] == y))))), '\n')
cat("are all weights equal for ", dist_meth, " method: ", all(unlist(lapply(weigh_ibs[-1], function(y) all(unlist(weigh_ibs[[1]] == y))))), '\n')
}
If this is not the case for your data set, would you mind adding a reproducible example, to find out if there is a bug in the function. thanks.
This is Robo-lampros because the Human-lampros is lazy. This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 7 days if no further activity occurs. Feel free to re-open a closed issue and the Human-lampros will respond.
If I ran this 20 times over the same data, I could get the same components, but in different order it would look like they weren't the same components for the GMM.
Here is my problem:
The means are not in descending order, so I could get permutations of centroids, associated covariances, and associated weights.
Therefore I suggest: sort by mean location, and order the covariances and weights in that way. Now I'm dealing with 1d data right now, and you have to make this work with multidimensional data