Not reproducible results with find.clusters

thibautjombart / adegenet

adegenet: a R package for the multivariate analysis of genetic markers

168 stars 64 forks source link

df is my dataset

foo.BIC <- find.clusters(df, max.n = 20, n.pca =200, scale = FALSE, stat = "BIC", method = "kmeans") plot(foo.BIC$Kstat, type="o", xlab="number of clusters (K)", ylab="BIC", col="green", main="Detection based on BIC") points(5, foo.BIC$Kstat[5], pch="x", cex=3) mtext(3, tex="'X' indicates the actual number of clusters")

foo.BIC$size foo.BIC$grp

Responding my findings here because I myself was looking for an answer to a similar problem. Hopefully this is useful for other users.

I've found this in another thread:

Odd shapes of the decrease of BIC can occur for several reasons. The possible explanations I can think of are: a) there are no clearly identifiable clusters in the data. b) there are clusters to be identified, but not enough information to disentangle different values of k. In your case this seems very likely: there are few SNPs, and if half of them are specific to one individual they are not informative in terms of clusters.

Original reference: https://lists.r-forge.r-project.org/pipermail/adegenet-forum/2011-June/000303.html

Otherwise, it would be worth increasing the number of runs of k-means (n.start, default is 10) and increase the number of iterations for each run (n.iter, default is 1e5) to gain a bit of stability. Hopefully that makes your analysis reproducible.

EDIT: just as an example, for my data the analysis stabilised for n.start=1000 and n.iter=1e9

thibautjombart / adegenet

Not reproducible results with find.clusters #335

df is my dataset