Closed danielwood1992 closed 2 years ago
Hi There!
Following Weir and Cockerham 1984 (https://www.jstor.org/stable/2408641), when averaging over a window, we use a weighted mean of FST rather than a simple mean of the per-site estimates. Using Weir and Cockerham's "abc" notation, an individual site has an FST of FST = a / (a + b +c) and a window has a weighted mean of FST(mean) = sum(a) / sum(a+b+c), where a b c and are lists/vectors of the individual variance components for each site. If there is variance in the sample size (number of genotypes) over your sites, the simple mean and weighted means will not be the same. We don't currently output the individual variance components for per-site FST estimates.
Hope that helps!
Aha I see, my mistake! Thanks for pointing that out!
Best wishes,
Daniel Wood
My pleasure, not a mistake, this isn't really explained anywhere! I appreciate you sharing your findings and we'll be sure to add a note about this for the future 👍
:)
Hello,
I've been using pixy v1.2.3.beta1 to calculate Fst (and will do dxy and pi as well later) both for individual sites and across 10kb windows. I was thinking it might be easier to calculate average windows from the per-site results, rather than recalculating everything again. However, I can't seem to get the averages from the per-site results to correspond to those in the windows: please see example below. The no_snps column corresponds to the number of snps in the per-site file, but the averages don't seem to correspond (assuming NA values aren't included; but setting these to 0 and taking the average doesn't get to the average in the 10kb window file, or setting any negative values to 0 or anything else I can think of). Is there something in calculating the across window average I'm missing here?
Thanks a lot!
Example: i) per site estimate command: pixy --bypass_invariant_check yes --vcf $vcf --window_size 1 --populations $popfile --n_cores 1 --stats fst --output $outdir --output_prefix pixy.1.$vcf
pop1 pop2 chromosome window_pos_1 window_pos_2 avg_wc_fst no_snps GR SA 36 15970 15970 NA 1 GR SA 36 15974 15974 -0.0056030669419049 1 GR SA 36 15980 15980 NA 1 GR SA 36 16081 16081 0.0061909416748127 1 GR SA 36 16085 16085 0.0061909416748127 1 GR SA 36 17059 17059 8.673617379884031e-18 1 GR SA 36 17079 17079 8.673617379884031e-18 1 GR SA 36 17405 17405 NA 1 GR SA 36 17407 17407 NA 1 GR SA 36 17435 17435 8.673617379884031e-18 1 GR SA 36 17443 17443 NA 1 GR SA 36 45327 45327 0.2492405950619206 1 GR SA 36 46204 46204 0.1249999999999999 1 GR SA 36 46215 46215 NA 1 GR SA 36 46216 46216 NA 1 GR SA 36 46217 46217 0.3713905136202677 1 GR SA 36 46236 46236 0.2486373546511627 1 GR SA 36 46419 46419 0.2861453942334653 1 GR SA 36 46507 46507 0.5208333333333334 1 GR SA 36 46577 46577 0.3095238095238095 1 GR SA 36 48127 48127 NA 1 GR SA 36 48243 48243 NA 1 GR SA 36 48335 48335 -3.903127820947815e-18 1 GR SA 36 48345 48345 NA 1 GR SA 36 48638 48638 0.2913163685606273 1 GR SA 36 48666 48666 NA 1 GR SA 36 48707 48707 0.287586243782425 1 GR SA 36 48713 48713 NA 1 GR SA 36 48765 48765 NA 1 GR SA 36 48798 48798 -0.0056030669419049 1 GR SA 36 48849 48849 0.2397322003347496 1 GR SA 36 48888 48888 0.0148356443323959 1 GR SA 36 48949 48949 8.673617379884031e-18 1 GR SA 36 49019 49019 NA 1 GR SA 36 49044 49044 0.1628959276018099 1 GR SA 36 49047 49047 NA 1 GR SA 36 49148 49148 NA 1 GR SA 36 49415 49415 0.0917837213777178 1 GR SA 36 49427 49427 0.4370689311485866 1 GR SA 36 49443 49443 0.442158328592032 1 GR SA 36 49473 49473 0.442158328592032 1 GR SA 36 49572 49572 0.6096829599756782 1 GR SA 36 49604 49604 0.4509639564124057 1 GR SA 36 49617 49617 0.0148356443323959 1 GR SA 36 49650 49650 0.3750000000000001 1 GR SA 36 49661 49661 0.3725149673232485 1 GR SA 36 49674 49674 NA 1 GR SA 36 49677 49677 0.3750000000000001 1 GR SA 36 49682 49682 0.3750000000000001 1 GR SA 36 49694 49694 NA 1 GR SA 36 49796 49796 0.3412602942537244 1 GR SA 36 49842 49842 NA 1 GR SA 36 49843 49843 NA 1 GR SA 36 49866 49866 0.0061909416748127 1
ii) 10kb window estimate command: pixy --bypass_invariant_check yes --vcf $vcf --window_size 10000 --populations $popfile --n_cores 3 --stats fst --output $outdir --output_prefix pixy.10kb.$vcf
pop1 pop2 chromosome window_pos_1 window_pos_2 avg_wc_fst no_snps GR SA 36 10001 20000 0.0011668872027648 11 GR SA 36 40001 50000 0.362339934891549 43