zhengxwen / SNPRelate

R package: parallel computing toolset for relatedness and principal component analysis of SNP data (Development version only)
http://www.bioconductor.org/packages/SNPRelate
98 stars 25 forks source link

systematic upward bias with `SNPRelate::snpgdsFst` #21

Open thierrygosselin opened 7 years ago

thierrygosselin commented 7 years ago

Hi Xiuwen,

I'm really impressed by the speed of SNPRelate really nice work !

I recently tested SNPRelate::snpgdsFst with method = "W&C84" and systematically get upward bias when tested against other software (e.g. GENODIVE, hierfsat).

I discarded these factors as potential cause for the bias:

Not tested :

Would really like to get to the bottom of this, because your implementation is really the fastest in R!

screenshot 2016-11-28 09 16 35

Cheers Thierry

zhengxwen commented 7 years ago

I will check my implementation of Fst. Please let me know the sample size in each population.

thierrygosselin commented 7 years ago

In the example above I used these:

POP_ID   N 
G100    10
G102    10
G103    10
G108    10
G109    10
G111    10
G118    10
G122     8

But do get similar bias with more than > 30 samples/pop Thanks for looking into this Thierry

thierrygosselin commented 7 years ago

Any update regarding this issue ?

zhengxwen commented 7 years ago

I have compared my implementation of W&C Fst with the implementations in plink_v1.9 and vcftools.

plink_v1.9 and vcftools return both "Weir and Cockerham mean Fst estimate" and "Weir and Cockerham weighted Fst estimate". SNPRelate returns "Weir and Cockerham weighted Fst estimate", which is the same as plink_v1.9 and vcftools.

It seems that "weighted Fst estimate" is a little higher than "mean Fst estimate".

thierrygosselin commented 7 years ago

Thanks Xiuwen, I'll make some tests on my end with different RADseq data and will update results. Thierry

zhengxwen commented 7 years ago

The latest SNPRelate on GitHub also provides MeanFst, and it will be available in the BioC release (in April). see the help document snpgdsFst().