Open sbreitbart opened 7 months ago
Hi Sophie, could you provide a worked-out example?
Also, if you are using a one-level hierarchy as it seems, I suggest you use the wc
function rather than varcomp.glob
.
Last, we are now advocating using the Weir-Goudet estimator of FST, using the fs.dosage
function, see Weir & Goudet 2017 and Goudet & Weir 2023
Best wishes
Hi there,
Thank you kindly for your prompt response! Here's my sampling scheme: individuals within populations within urban/rural groups within the entire group of all sampled individuals.
I start off creating a genind from my vcf with vcfR2genind
.
my_genind
/// GENIND OBJECT /////////
// 256 individuals; 972 loci; 1,944 alleles; size: 2.5 Mb
// Basic content
@tab: 256 x 1944 matrix of allele counts
@loc.n.all: number of alleles per locus (range: 2-2)
@loc.fac: locus factor for the 1944 columns of @tab
@all.names: list of allele names for each locus
@ploidy: ploidy of each individual (range: 2-2)
@type: codom
@call: adegenet::df2genind(X = t(x), sep = sep)
// Optional content
@pop: population of each individual (group size range: 1-5)
@strata: a data frame with 3 columns ( sample, population, urban_rural )
The strata (note that some populations contain only one individual)
head(my_genind$strata)
sample population urban_rural
1 1.2_1A10_sorted 67 Urban
2 1.2_1A11_sorted 79 Urban
3 1.2_1A1_sorted 40 Urban
4 1.2_1A2_sorted 47 Rural
5 1.2_1A3_sorted 2 Urban
6 1.2_1A4_sorted 12 Urban
# create hierfstat-friendly genind
formatted_genind <- genind2hierfstat(my_genind)
head(formatted_genind[, 1:3])
pop X102.87.. X107.21..
1.2_1A10_sorted 67 0 0
1.2_1A11_sorted 79 0 0
1.2_1A1_sorted 40 0 0
1.2_1A2_sorted 47 0 0
1.2_1A3_sorted 2 0 0
1.2_1A4_sorted 12 0 0
# estimate var comps and hierarchical fsts
fsts <- varcomp.glob(levels = data.frame(pops$urban_rural,
pops$population),
loci = formatted_genind[,-1],
diploid = TRUE)
The fsts$F
table looks like this when exported:
Level | Urban/Rural Group | Sampling Site | Individual |
---|---|---|---|
Total | 0.0005 | 0.0798 | 0.1637 |
Urban/Rural Group | - | 0.0794 | 0.1633 |
Sampling Site | - | - | 0.0912 |
I then perform this process with vcfs containing individuals from just urban and rural groups. For example, this is for the urban group:
fsts_urb <- varcomp.glob(levels = data.frame(pops_urb$population),
loci = formatted_genind_urb[,-1],
diploid = TRUE)
Then, I compare them to my results from vcftools' --weir-fst-comp
and see they're pretty different, but I don't know why. If anything, I'd think the hierfstat
results make more sense than the vcftools' results.
Urban: F(sampling site/total): vcftools = 0.079; hierfstats = -0.003.
Rural: F(sampling site/total): vcftools = 0.049; hierfstats = 0.005.
Thanks Sophie,
How many rural and urban sites do you have? From your output, F[sampling-site-UR group]
is 0.0794, very close to what vcftools is giving for sampling-site-Total Urban, and not far from sampling site-total Rural, whereas hierfstat
gives very different results.Your syntax looks correct, but I'm not sure what formatted_genind_urb
contains, nor pops_urb$population
. Could the indices have been mixed up?
perhaps you could try out the following?
formatted_genind_urb<-formatted_genind[my_genind$strata$urban_rural=="Urban",]
wc(formatted_genind_urb)
formatted_genind_rur<-formatted_genind[my_genind$strata$urban_rural=="Rural",]
wc(formatted_genind_rur)
Hi Jerome,
Hmm- perhaps I did make an error as you suggested. Regardless, since my last message, my team and I have decided to seek a way to calculate Fst using Bhatia's/Hudson's estimator. Do you have any advice for doing this with hierfstat or another R package? Preferably using a vcf as input? Also, I started troubleshooting fs.dosage()
but was unsure how to convert my vcf into the right input matrix format.
Thanks again for your continued help!
to read a vcf in hierfstat, use the function read.VCF
. The object you obtained is a bed object from package gaston
, and you can use it directly with fs.dosage
, you need to pass the population identifier of the different individuals as a second argument to the function
to read a vcf in hierfstat, use the function read.VCF
. The object you obtained is a bed object from package gaston
, and you can use it directly with fs.dosage
, you need to pass the population identifier of the different individuals as a second argument to the function
Hi Jerome- I was able to use the fs.dosage()
command successfully for my individual sampling sites- thank you so much!
However, to confirm, I cannot calculate genetic differentiation among the two groups at the highest level (urban and rural) while accounting for the substructure, correct? Again, my data contains individuals within sampling sites within urban/rural groups, like this, and I'm searching for a way to calculate hierarchical FST with a method that accounts for differences in sampling size among groups.
> head(pops)
sample sampling_site urban_rural_group
1 1.2_1A10_sorted 67 Urban
2 1.2_1A11_sorted 79 Urban
3 1.2_1A1_sorted 40 Urban
4 1.2_1A2_sorted 47 Rural
5 1.2_1A3_sorted 2 Urban
6 1.2_1A4_sorted 12 Urban
A hierarchical version of fs.dosage
has not been implemented yet, (I am not sure what extra information it would bring, but am willing to reconsider my position). From the analysis with varcomp.glob
you provide above it looks like there is almost no differences between urban and rural sites?
Yes, it does look like there is almost no difference between urban/rural groups with the varcomp.glob
results; however, my concern is that this function still uses Weir-Cockerham estimator, correct? I need to use an estimator that accounts for differences in sampling sizes. Do you know how I can calculate hierarchical f-statistics using such an estimator with hierfstat
?
No other ways than varcomp.glob
to do hierarchical for the time being. But if you are worried about sample size effect (I don't think it will matter with such a low FST), you could try downsampling your data set to equalize sample sizes. You could do this several time to see if it affects the values of the between habitat FST
Hello,
I've computed Fst among groups with both hierfstat's varcomp.glob() and vcftools --weir-fst-pop. Each give different answers even though both use the WC formula. I'm wondering if the reason behind the discrepancy is the way each software treats populations with 1 individual (no within-population variance). Does hierfstat remove them from the dataset?
Thank you, Sophie