Closed kopelol closed 2 years ago
Hello @kopelol. I am having the exact same issue you are describing here with scatter(dapc). I'm using adegenet 2.1.5 . My plot is showing 4 groups (as in 4 numbered boxes) but no individual samples. I was wondering if you were able to identify and fix this issue with your data. Thank you!!!
Hi @leonvarhan. Thank you for your reply. I hope so. Thanks,
Hi @kopelol,
You are not seeing any individual points because you are using 80 PCs to estimate 5 groups via clustering and then using the same PCs to fit the discriminant analysis to the groups that you just identified.
In short: you are over-fitting the model such that any within-group variance is vastly overshadowed by among-group variance and thus all the points within the groups are tightly packed.
Hi @zkamvar Thank you for your advice. I understand.
Generally, how many PCs should I use?
Generally, how many PCs should I use?
There is not a magic number of PCs to use. For DAPC, you want to avoid overfitting by using a number that is sufficient enough to describe a vast majority of the variance (e.g enough PCs to describe ~80% of the data). I would suggest to read The DAPC tutorial, especially section 4, which goes into the instability of group memberships after overfitting.
Hello everyone, I'm trying to do DAPC analysis using core gene alignment fasta file obtained from 95 bacteria strains, but I can't obtain graph with individual dots.
Firstly, I tried to extract SNPs from multiple alignment fasta.
/// GENLIGHT OBJECT /////////
// 95 genotypes, 198,174 binary SNPs, size: 4.7 Mb 0 (0 %) missing data
// Basic content @gen: list of 95 SNPbin @ploidy: ploidy of each individual (range: 1-1)
// Optional content @ind.names: 95 individual labels @loc.all: 198174 alleles @position: integer storing positions of the SNPs @other: a list containing: elements without names
Then, I conducted DAPC.
Could you please give some advice? Thanks,