Open thierrygosselin opened 7 years ago
I will check my implementation of Fst. Please let me know the sample size in each population.
In the example above I used these:
POP_ID N
G100 10
G102 10
G103 10
G108 10
G109 10
G111 10
G118 10
G122 8
But do get similar bias with more than > 30 samples/pop Thanks for looking into this Thierry
Any update regarding this issue ?
I have compared my implementation of W&C Fst with the implementations in plink_v1.9 and vcftools.
plink_v1.9 and vcftools return both "Weir and Cockerham mean Fst estimate" and "Weir and Cockerham weighted Fst estimate". SNPRelate returns "Weir and Cockerham weighted Fst estimate", which is the same as plink_v1.9 and vcftools.
It seems that "weighted Fst estimate" is a little higher than "mean Fst estimate".
Thanks Xiuwen, I'll make some tests on my end with different RADseq data and will update results. Thierry
The latest SNPRelate on GitHub also provides MeanFst, and it will be available in the BioC release (in April).
see the help document snpgdsFst()
.
Hi Xiuwen,
I'm really impressed by the speed of
SNPRelate
really nice work !I recently tested
SNPRelate::snpgdsFst
withmethod = "W&C84"
and systematically get upward bias when tested against other software (e.g. GENODIVE, hierfsat).I discarded these factors as potential cause for the bias:
Not tested :
SNPRelate::snpgdsFst
handles uneven sampling size across populationsWould really like to get to the bottom of this, because your implementation is really the fastest in R!
Cheers Thierry