jgx65 / hierfstat

the hierfstat package
24 stars 14 forks source link

inconsistent results: fstat vs fst.pairwise for two pops #20

Closed mutualism closed 5 years ago

mutualism commented 6 years ago

I may have a fundamental misunderstanding of what's happening but it seems to me that the functions fstat() and pairwise.fst() should give the same value for the case of two populations, which they do not.

A reproducible example: library(hierfstat) library(adegenet) # for data data(nancycats) obj1 <- seppop(nancycats)$P01 obj2 <- seppop(nancycats)$P02 obj3 <- repool(obj1, obj2) fstat(obj3) pairwise.fst(obj3)

And output:

fstat(obj3) pop Ind Total 0.1307741 0.2804306 pop 0.0000000 0.1721722 pairwise.fst(obj3) 1 2 0.080185

Any help would be very much appreciated.

Best, Manuel

jgx65 commented 6 years ago

The pairwise.fst function compute Nei's FST, while the fstat function computes weir and cockerham FST. The major difference is accounting for the number of populations, there is a np/(np-1) bias in the estimator by Nei (np being the number of populations sampled).

Note that the genet.dist function calculate both Weir and Cockerham estimator (method="WC84") and Nei unbiased version (method="Nei"):

`

genet.dist(obj3,method="WC84") P01 P02 0.1307741 genet.dist(obj3,method="Nei") P01 P02 0.1334 `

mutualism commented 6 years ago

Thanks! For a new user this difference is a little unexpected given that the help for both functions calls the same page, and although the difference can perhaps be gleaned from the help, it is not obvious. Also, the Nei option in genet.dist still calculates the Fst value differently than pairwise.fst (see your example above and compare to mine). Finally, it's not clear why two (or three) different formulas would be used - would it be better to have the package always compute one formula by default? It would be trivial to change the f1 function in pairwise.fst to calculate Fst using fstat ...

f1 <- function(pop1, pop2) { temp <- repool(pop1, pop2) return(fstat(temp, fstonly=TRUE)) }

Again, thanks for your response!

jgx65 commented 6 years ago

Why two or three formulae... good question. the pop gen literature is sometimes a bit complicated. Several estimators exist and pop geneticists have been arguing about which one to use when. I give all the relevant references in the hierfstat::genet.dist help page.

I agree that the estimate given by pairwise.fst should be changed.

@thibautjombart , could you have a look at this, or do you mind if I change it?

plantarum commented 5 years ago

I have been struggling with this issue today. To be sure I understand, can you confirm a few things:

Thanks,

Tyler

jgx65 commented 5 years ago

Arghhh. I see this message today. basic.stats computes Nei's (1987 Molecular Evolutionary Genetics. Columbia University Press) Fsts, see p 164-165 of the book. There is no method "Nei" in genet.dist, but a method "Nei87" that estimates pairwise Nei's Fst' (Fstp).

fstat is a wrapper for varcomp.glob if the data set is a genind object (see adegenet) and estimates Weir & Cockerham (1984) FST, and can accommodate for different hierarchical levels (see ?varcomp.glob)

pairwise.fst is another wrapper for genind object that estimates pairwise Nei's Gst ((HT-HS)/HT). I do not recommend using this. Instead use genet.dist(method="Nei87") or equivalently pairwise.neifst

jgx65 commented 5 years ago

'pairwise.fst' has been removed from 'hierfstsat' following the discussion in this thread. @thibautjombart hope it is ok with you, several alternatives exist to calculate pairwise FSTs.