Closed alexkrohn closed 2 years ago
Hi @alexkrohn
You can read VCF directly in hierfstat using the read.VCF
function. The bed
object created can be used directly with beta.dosage
to obtain kinship and inbreeding coefficients, or fs.dosage
to obtain FST, pairwise FSTs, population specific FSTs etc...
dat<-read.VCF("my.vcf.file.vcf[.gz]")
kin.dat<-beta.dosage(dat)
pop=vector.of.populations.that.correspond.to.individuals
fs.dat<-fs.dosage(dat,pop=pop)
plot(fs.dat)
Still, biall2dos
should work, try dos<-biall2dos(df[,-1])
(to exclude the first column containing the population identifier in df
)
Importing with read.VCF
worked! Thank you! Something must've gotten messed up in all the conversions.
biall2dos(df[,-1])
still doesn't work, giving the same error about non-numeric values.
You've resolved my issue, so I'll close this comment. Thanks for your help.
....While I have you here, though (poor Github form, I'm sure), I was wondering what you think about the best way to get Ho and Hs for a population. Do you think it's sufficient to take the mean Ho and Hs across all loci at a given population?
E.g.:
stats <- basic.stats(df)
ho <- apply(stats$Ho, 2, function(x){mean(x,na.rm=TRUE)})
hs <- apply(stats$Hs, 2, function(x){mean(x,na.rm=TRUE)})
Or is there a better way? I've found that doing this for Fis doesn't work well.
For example:
fis1 <- apply(stats$Fis, 2, function(x){mean(x,na.rm=TRUE)})
fis2 <- 1 - (ho/hs)
fs.dat <- fs.dosage(dat, pop = pop
fis3 <- fs.dat$Fs[1,]
Yields three slightly different values for Fis.
Any suggestions?
There are functions Hs
and Ho
in hierfstat, they do just that:
library(hierfstat)
data(gtrunchier)
Hs(gtrunchier[,-2])
1 2 3 4 5
0.56571667 0.03776667 0.38731667 0.26976667 0.43050000
6
0.36266667
Ho(gtrunchier[,-2])
1 2 3 4 5 6
0.0859000 0.0000000 0.0243000 0.0463000 0.0669500 0.1958333
This should be identical/ almost identical to your ho
and hs
. As for Fis
, fis1
is an average of ratios, with poor properties. fis2
and fis3
should be very close, fis3
is the average of the individuals inbreeding coefficients in each population.
Hope this helps.
This is perfect. Thanks very much for your advice.
Hi @jgx65,
I've been using
hierfstat
to calculatebasic.stats
and genetic distances, including WC Fst from VCFs for a while now. I'm using RADseq datasets. Given that the datasets often have a lot of missing data, I calculate the statistics at a variety of missing data thresholds to see how missing data affects the statistics.Some of my datasets contain related individuals, so I would like to try the dosage based Fst calculation from your 2017 paper, especially pairwise individual or population-level Fsts. However, I run into problems with loci that have missing data. Is there any way around this?
In general my workflow looks like this:
This works even with up to 70% missing data (although that is obviously not ideal, and the inferences are not very believable).
However, this does not work:
Is there an easy way to convert genind/vcf or hierstat dfs to dos? I'm happy to send over smaller datasets for you to examine more closely.