thibautjombart / adegenet

adegenet: a R package for the multivariate analysis of genetic markers
166 stars 64 forks source link

genet.dist not working for populations with large difference in sample size #351

Closed georgeomics closed 1 year ago

georgeomics commented 1 year ago

I am attempting to calculate Fst using genet.dist for 6 populations with the corresponding sample sizes:

  1   2   3   4   5   6 
 97 133 219  16  16  53 

My code looks like the following:

df1 <- subset(data, population %in% c(1,2))
    dg1 <- df2genind(d1, ploidy=2, ncode=1, pop=d1$population)
    calcFst <- genet.dist(dg1, method = "WC84")

And works great as long as one of the populations is not 4 or 5. If population 4 or 5 is used, I receive the following error:

Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE,  : 
  arguments imply differing number of rows: 113, 57
In addition: Warning message:
In matrix(unlist(e), ncol = x@ploidy[1], byrow = TRUE) :
  data length [113] is not a sub-multiple or multiple of the number of rows [57]

However, the code still works as intended when the populations being compared are 4 AND 5 (i.e., c(4,5)). One obvious thing to me is the difference in sample sizes. What could be the source of the error?

jgx65 commented 1 year ago

Hi,

This looks like a issue you have with hierfstat::genetdist rather that adegenet. You might consider reposting there. In any case, without an example data set, it is difficult to answer your question. And, I am wondering why you are subsetting your data, as hierfstat::genet.dist will produce estimates of genetic distances for all pairs of populations?

georgeomics commented 1 year ago

We're randomizing population assignments between pairwise regions hence the subsetting. Though I resolved the issue, which turned out to be due to the presence of the population column in populations with a relatively "small" number of individuals. I updated the code like so to remove the population column:

df1 <- subset(data, population %in% c(1,2))
    dg1 <- df2genind(d1[,-1], ploidy=2, ncode=1, pop=d1$population)
    calcFst <- genet.dist(dg1, method = "WC84")

Still not sure why the previous code runs fine with all other populations (and produces similar results), but not specifically for those populations with 16 individuals. But it is running as intended now across all population comparisons.

jgx65 commented 1 year ago

I repeat.

If you can live with the solution you found, please close this issue, otherwise,close it here and continue the conversation there.