thibautjombart / adegenet

adegenet: a R package for the multivariate analysis of genetic markers
165 stars 64 forks source link

scatter error #352

Closed cfz1998 closed 1 year ago

cfz1998 commented 1 year ago

Hi! @zkamvar

According to the tutorial for Discriminant Analysis of Principal Components (DAPC). The scatter does not work well. image

library(adegenet)
library(vcfR)

x <- read.vcfR("my.vcf", verbose=F)
y <- vcfR2genind(x)

grp <- find.clusters(y, max.n.clust=100, n.clust = 3)
# 40 for PCA
dapc1 <- dapc(y, grp$grp, n.pca = 40, n.da = 4)
# scatter(dapc1)

scatter(dapc1)

May be the big number of pca.cent? image

Thank you for your reply!

zkamvar commented 1 year ago
grp <- find.clusters(y, max.n.clust=100, n.clust = 3)
# 40 for PCA
dapc1 <- dapc(y, grp$grp, n.pca = 40, n.da = 4)
# scatter(dapc1)

You are using the same number of Principle Components to train your model as you used to detect your clusters. You are overfitting the model. Reduce the number of principle components.

cfz1998 commented 1 year ago

Hi! @zkamvar I got the same result when i reduced the number of principal components to detect clusters.

image

zkamvar commented 1 year ago

It's not an error. This means that you are comparing groups that are quite disparate.