jgx65 / hierfstat

the hierfstat package
24 stars 14 forks source link

NaN as result of basic.stats calculations for small populations #69

Closed JoulesOLVERA closed 7 months ago

JoulesOLVERA commented 8 months ago

Hi Jerome !!

I have computed some genetic estimates among populations but I do not know why I get NaN in many of them (e.g. Fis, Ho, He) and I am wondering if it is a small sample issue (populations numbers range from 1 to 17 !). I paste part of the commands that I have used with the outcomes:

Afirma /// GENIND OBJECT /////////

// 104 individuals; 871 loci; 1,742 alleles; size: 1.2 Mb

// Basic content @tab: 104 x 1742 matrix of allele counts @loc.n.all: number of alleles per locus (range: 2-2) @loc.fac: locus factor for the 1742 columns of @tab @all.names: list of allele names for each locus @ploidy: ploidy of each individual (range: 2-2) @type: codom @call: adegenet::df2genind(X = t(x), sep = sep)

// Optional content @pop: population of each individual (group size range: 1-17)

summary(Afirma@pop) ACO ADA AEC AEX AEZ AGO AHU AIN ALO ASJ ASN ATL ATQ ATZ AUN AZA 4 4 4 3 17 1 1 6 10 5 8 15 13 7 5 1

library(hierfstat) basic_Afirma = hierfstat::basic.stats(Afirma, diploid = TRUE)

Ho_Afirma = apply(basic_Afirma$Ho, MARGIN = 2, FUN = mean) %>% round(digits = 2) Ho_Afirma ACO ADA AEC AEX AEZ AGO AHU AIN ALO ASJ ASN ATL ATQ ATZ AUN AZA NaN NaN NaN NaN 0.23 NaN NaN 0.24 NaN NaN 0.25 0.23 0.26 NaN NaN NaN

Hs_Afirma = apply(basic_Afirma$Hs, MARGIN = 2, FUN = mean) %>% round(digits = 2) Hs_Afirma ACO ADA AEC AEX AEZ AGO AHU AIN ALO ASJ ASN ATL ATQ ATZ AUN AZA NA NA NA NA 0.17 NA NA NaN NA NA 0.17 0.17 0.19 NA NA NA

Fis_Afirma = apply(basic_Afirma$Fis, MARGIN = 2, FUN = mean) %>% round(digits = 2) Fis_Afirma ACO ADA AEC AEX AEZ AGO AHU AIN ALO ASJ ASN ATL ATQ ATZ AUN AZA NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN

Do you think there is something wrong with my data ? I do not know why it is not possible to do some calculations even for populations with n > 5 :( Also I wanted to ask if it is possible to estimate Fst values among pairs of populations and not only a mean value for all the data.

Cheers, Joules

jgx65 commented 7 months ago

try adding na.rm=TRUE after FUN=mean. Note there exists hierfstat::Ho and hierfstat::Hs , e.g.:

library(hierfstat)
data(gtrunchier)
basic.stats(gtrunchier[,-2])
Ho(gtrunchier[,-2])
Hs(gtrunchier[,-2])
JoulesOLVERA commented 7 months ago

Hi Jerome !! It worked !! Thanks à lot :D I also tried the functions hierfstat::Ho and hierfstat::Hs and gave me the same results.

Have a nice week continuation, Joules