Open conniecl opened 7 years ago
I'm having the same issue. After looking at the code, here is what I learned:
unsigned int N_site_mismatches = 0;
for (vector<int>::iterator ac = allele_counts.begin(); ac != allele_counts.end(); ++ac)
{
N_site_mismatches += (*ac * (N_non_missing_chr - *ac));
}
if (N_site_mismatches == 0)
continue; // Site is actually fixed.
If a site has a fixed allele, it is disregarded. A site is defined by allele count (AC) and non missing samples (not ./.). A quick way to check is to see if you have sites where AC = 2 * number of samples. (assuming none of the samples have missing data).
Hi, While using vcftools for window-pi analysis, I found in the output, the number of N_VAEIANTS is different with my input VCF files. And here is my command:
vcftools_0.1.13/bin/vcftools --vcf chr9.hmp3.vcf --window-pi 20000 --window-pi-step 2000 --out ./chr9.hmp3.20k
And part of the output:in BIN 1-20000, the input contain 474 snp actually, instead 446 in the output file. I also check if it just use bi-allele, or filer some snp by missing rate, but can not find a rule about it. How the vcftools calculate the N_VARIANTS? Sincerely hope can get some advise, and thanks in advance.