simonhmartin / genomics_general

General tools for genomic analyses.
343 stars 93 forks source link

pi value is large #87

Closed chen1238 closed 2 years ago

chen1238 commented 2 years ago

The results of vcftools calculating pi values are different compared to popgenWindows.py, which calculates very large pi, I don't quite understand what is the reason for this, is it because of the different calculation methods used?

simonhmartin commented 2 years ago

Note that this script needs to have the invariant sites present in order to accurately compute pi, otherwise the value will be inflated. This is because, unlike vcftools, missing sites are not just assumed to be invariant. There are problems with making this assumption. These are described well in this paper: https://doi.org/10.1111/1755-0998.13326

chen1238 commented 2 years ago

Thank you for your reply and i understand what you mean.

simonhmartin commented 2 years ago

Great. You're welcome.

chen1238 commented 2 years ago

Excuse me,dxy also needs invariant sites, right?

simonhmartin commented 2 years ago

Yes, unfortunately. I know this makes the files very large, but it is the only way to get an accurate value with these scripts.

jiazhongguo2019 commented 1 year ago

Excuse me,dxy also needs invariant sites, right?

jiazhongguo2019 commented 1 year ago

Could you tell me how do you get invariant sites for the missing sites

Excuse me,dxy also needs invariant sites, right?

Could you tell me how do you get invariant sites for the missing sites? Thanks a lot

simonhmartin commented 1 year ago

Sorry I never responded to this. I don't understand the question. Many genotyping tools such as GATK, bcftools and Freebayes can export invariants if requested. Simon

jiazhongguo2019 commented 1 year ago

Hello Simon;

Does the invariant SNP sites refer to as a loci at which all sampled individuals have the 1/1 genotypes (the “”“0” allele denotes the reference allele). In other words, such a locus is fixed (only one allele is present) in a sampled population, while all samples carry an allele different from those in the individuals used for the reference genome  assembly.  

Thanks a lot!

祝 好  


郭家中 博士    We  are drowning in information and starving for knowledge.


联系方式: 四川农业大学动物科技学院动物遗传学教研组 四川农业大学羊遗传育种团队 四川省 成都市温江区惠民路211号 邮编:611130 手机:+86-18227570029

 

------------------ 原始邮件 ------------------ 发件人: "simonhmartin/genomics_general" @.>; 发送时间: 2023年5月13日(星期六) 晚上9:15 @.>; @.**@.>; 主题: Re: [simonhmartin/genomics_general] pi value is large (Issue #87)

Sorry I never responded to this. I don't understand the question. Many genotyping tools such as GATK, bcftools and Freebayes can export invariants if requested. Simon

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>

simonhmartin commented 1 year ago

Invariant sites are those where all individuals have the same genotype or ./. in the vcf. So they can all be 0/0 or 1/1, but not both at the same site. Many genoypers will not output the sites where all individuals are 0/0 by default, but usually they can do this if he option is specified.