Closed alexkrohn closed 1 year ago
Hi @alexkrohn, thanks for your question. If the parameter Fst is truly 0, then its estimates might be slightly negative. Your permutation test is sound, I would simply count the number of randomized Fst larger or equal to the observed to obtain the p -value, and/or calculate the 95% empirical CI (quantile(nfst,c(0.025,0.975)
) rather than the parametric one :
#simulate 2 pops with Nm=100, 5 inds in each pop 30k SNPs
dat<-sim.genot(size=5,nbal=2,nbpop=2,nbloc=30000,N=10000,mig=0.01)
#transform data to dosage
dos<-biall2dos(dat[,-1])
#estimate true fst
(tfst<-fst.dosage(dos,rep(1:2,each=5))[3])
#empirical null fst distribution
nfst<-replicate(999,fst.dosage(dos,sample(rep(1:2,each=5)))[3])
#associated p-value
mean(c(nfst,tfst)>=tfst)
#empirical 95% ci of null dist for Fst
quantile(nfst,c(0.025,0.975))
#redo with Nm=1000
dat<-sim.genot(size=5,nbal=2,nbpop=2,nbloc=30000,N=10000,mig=0.1)
dos<-biall2dos(dat[,-1])
#estimate true fst
(tfst<-fst.dosage(dos,rep(1:2,each=5))[3])
#empirical null fst distribution
nfst<-replicate(999,fst.dosage(dos,sample(rep(1:2,each=5)))[3])
#associated p-value
mean(c(nfst,tfst)>=tfst)
#empirical 95% ci of null dist for Fst
quantile(nfst,c(0.025,0.975))
Thanks for your help with this test. I appreciate the understanding that negatives should be treated as 0.
I am attempting to determine whether the pairwise FST observed between two populations is greater than one would expect by chance. One way I thought to do this would be to generate a "null distribution" of FST values, then compare the actual FST value to the null distribution. If the real value is higher than the 95% confidence interval of the null distribution, I could conclude the real value is larger than one would expect by chance. (If there is a simpler way to test this, I'm open to suggestions!)
For example, in
myvcf
I have 10 individuals, 5 from one population and 5 from another, in that order. To generate the null distribution and confidence interval, I do:In my actual dataset, the real FST value is 0.002. However, the 95% confidence interval goes from -0.005 to -0.0007.
Biologically, I know that even in the real populations, there is very very little differentiation. However, I'm still curious how to interpret those negative FST values. Is it just sampling error with small population sizes and many alleles? (The dataset is ~30k SNPs from RADseq.) Should a negative FST value be interpreted as 0, or better as NA?
Thanks for your help!