jaredgk / PPP

Repository for pipeline code
24 stars 4 forks source link

Strange Fis estimates #43

Open siribi opened 2 years ago

siribi commented 2 years ago

Hi,

I used vcf_calc.py to calculate het-fis, but I got very odd results. I'm working with a highly selfing plant, and nearly all individuals had negative or very low values. The only exception was the population of the reference genome.

Something similar happened when I used fs.dosage() in hierfstat package and in the hierfstat manual it says that individual inbreeding coefficients are estimated with the reference being the population to which the individual belongs. The --het function in VCFtools gives estimates that are more in line with expectations, but I'm struggling to find a method that includes both population information and big vcf files.

My question: I'm very interested in getting one Fis estimate per population - any chance you could include a function in PPP that calculates Fis per population and that does not assume that an individual belongs to the population of the reference genome? If this is what is happening? Or alternatively an estimation of expected and observed heterozygosity per population?

If this exists already and I didn't see it, I'm sorry!

Siri

P.S. I tested out a few of the scripts in the PPP and I really liked it. It's been the most user friendly of the pop-gen packages / scripts I have tried so far.

siribi commented 2 years ago

Ok, sorry - I see in the log file that het-fis is calculated with vcftools by keeping one population at the time, but I still find the output surprisingly low