ERROR: number of cluster centres must lie between 1 and nrow(x)

kopelol commented 2 years ago

Hello everyone

I'm trying to do DAPC analysis using vcf file including genome-wide SNPs(197,696 loci) data obtained from 28 strains.

Firstly, I converted vcf file into genind file. library(vcfR) x <- read.vcfR("file.vcf", verbose=F) y <- vcfR2genind(x)

x

Object of Class vcfR 28 samples 1 CHROMs 197,696 variants Object size: 66.4 Mb 0 percent missing data

y

/// GENIND OBJECT /////////

// 28 individuals; 197,696 loci; 406,061 alleles; size: 165 Mb

// Basic content @tab: 28 x 406061 matrix of allele counts @loc.n.all: number of alleles per locus (range: 1-4) @loc.fac: locus factor for the 406061 columns of @tab @all.names: list of allele names for each locus @ploidy: ploidy of each individual (range: 2-2) @type: codom @call: adegenet::df2genind(X = t(x), sep = sep)

// Optional content

empty -

Then, grp <- find.clusters(y, max.n.clust=40) to identify clusters.

After following message, I put adequate number.

Choose the number PCs to retain

I got following error.

number of cluster centres must lie between 1 and nrow(x)

I did this analysis using provided example data successfully, so I think my data type is not suitable.

Could you please give me some advise?

Regards,

kopelol commented 2 years ago

It's haploid.

zkamvar commented 2 years ago

number of cluster centres must lie between 1 and nrow(x)

Run the function without the max.n.clust argument.

You have 28 samples in your data set, but you chose to have a maximum of 40 clusters. It's failing because iterates the algorithm over the number of possible clusters. As soon as it reaches the number of clusters equal to the number of individuals, it will fail.

The default maximum number of clusters is round(nInd(y)/10).

kopelol commented 2 years ago

Thank you for your reply. It worked well and I could understand.

Thank you!!

thibautjombart / adegenet

ERROR: number of cluster centres must lie between 1 and nrow(x) #312