systematic upward bias with `SNPRelate::snpgdsFst`

zhengxwen / SNPRelate

R package: parallel computing toolset for relatedness and principal component analysis of SNP data (Development version only)

http://www.bioconductor.org/packages/SNPRelate

98 stars 25 forks source link

systematic upward bias with `SNPRelate::snpgdsFst` #21

Open thierrygosselin opened 7 years ago

thierrygosselin commented 7 years ago

Hi Xiuwen,

I'm really impressed by the speed of SNPRelate really nice work !

I recently tested SNPRelate::snpgdsFst with method = "W&C84" and systematically get upward bias when tested against other software (e.g. GENODIVE, hierfsat).

I discarded these factors as potential cause for the bias:

missing value (I work with RADseq data...) because I tested with and without imputation
common markers between populations
monomorphic markers

Not tested :

the way SNPRelate::snpgdsFst handles uneven sampling size across populations

Would really like to get to the bottom of this, because your implementation is really the fastest in R!

Cheers Thierry

zhengxwen commented 7 years ago

I will check my implementation of Fst. Please let me know the sample size in each population.

thierrygosselin commented 7 years ago

In the example above I used these:

But do get similar bias with more than > 30 samples/pop Thanks for looking into this Thierry

thierrygosselin commented 7 years ago

Any update regarding this issue ?

zhengxwen commented 7 years ago

I have compared my implementation of W&C Fst with the implementations in plink_v1.9 and vcftools.

plink_v1.9 and vcftools return both "Weir and Cockerham mean Fst estimate" and "Weir and Cockerham weighted Fst estimate". SNPRelate returns "Weir and Cockerham weighted Fst estimate", which is the same as plink_v1.9 and vcftools.

It seems that "weighted Fst estimate" is a little higher than "mean Fst estimate".

thierrygosselin commented 7 years ago

Thanks Xiuwen, I'll make some tests on my end with different RADseq data and will update results. Thierry

zhengxwen commented 7 years ago

The latest SNPRelate on GitHub also provides MeanFst, and it will be available in the BioC release (in April). see the help document snpgdsFst().