jgx65 / hierfstat

the hierfstat package
24 stars 14 forks source link

Problem using hierfstat with genind object #23

Closed suzkel closed 6 years ago

suzkel commented 6 years ago

I have a data set of 4451 individuals and 473 SNPs.

I am able to convert the dataset into a genind object, but cannot use any of the hierfstat functions with this object. Alleles are encoded as characters, (A,T,G,C). Each allele has two characters.

This is the error I get with basic.stats(): Error in 1:sum(data[, 1] == i) : NA/NaN argument

And this one with pairwise.fst(): Error in if (any(object@ploidy < 1L)) { : missing value where TRUE/FALSE needed

I am able to run the same functions with the sample data set, "Master_Pinus_data_genotype.txt", so the problem is with my data set.

Huge thanks in advance to any tips on what is causing these errors.

/// GENIND OBJECT /////////

 // 4,386 individuals; 473 loci; 946 alleles; size: 16.3 Mb

 // Basic content
   @tab:  4386 x 946 matrix of allele counts
   @loc.n.all: number of alleles per locus (range: 2-2)
   @loc.fac: locus factor for the 946 columns of @tab
   @all.names: list of allele names for each locus
   @ploidy: ploidy of each individual  (range: 2-2)
   @type:  codom
   @call: df2genind(X = loci, sep = "", pop = pop, ploidy = 2)

 // Optional content
   @pop: population of each individual (group size range: 111-1766)
zkamvar commented 6 years ago

One thing to note is that your data may be malformed.

You mention:

I have a data set of 4451 individuals and 473 SNPs.

But your data show:

4,386 individuals; 473 loci; 946 alleles

which indicates that you have lost 65 samples somewhere.

Moreover, the error here indicates that you have missing data in your ploidy definitions:

Error in if (any(object@ploidy < 1L)) { :
missing value where TRUE/FALSE needed

Also, given that you mention that specific data set, I'm assuming you are following http://popgen.nescent.org/StartSNP.html as an example?

The above may better help us figure out what's going on.

suzkel commented 6 years ago

AHA It does work with a subset. I have some individuals who were not assigned to a population, leading to the error.

Thank you for the suggestion!!

zkamvar commented 6 years ago

Excellent! Thank you for posting a description of the solution :)

If it helps, I would also recommend to have sanity checks throughout your script so that if anything changes, you'll know when it happened:

## read in data and format it --------------------
## ...

# Number of individuals expected
stopifnot(nInd(dat) == 4451)

# Number of loci expected
stopifnot(nLoc(dat) == 473)

# Number of populations expected
stopifnot(nPop(dat) == 9) 

## run analyses here -----------------------------
## ...