ksamuk / pixy

Software for painlessly estimating average nucleotide diversity within and between populations
https://pixy.readthedocs.io/
MIT License
115 stars 14 forks source link

[pixy] ERROR: the provided VCF appears to contain no variable sites. #60

Closed StefanoLonardi closed 2 years ago

StefanoLonardi commented 2 years ago

Hi Kieran, this is Stefano, a colleague at UC Riverside. I have used pixy on the 11 chromosomes of cowpea that we studying with Prof Close, but it fails on three chromosomes. It works for the other eight, so I do not understand what is the issue.

Describe the bug

[pixy] pixy 1.2.6.beta1
[pixy] See documentation at https://pixy.readthedocs.io/en/latest/
[pixy] Validating VCF and input parameters...
[pixy] Checking write access...OK
[pixy] Checking CPU configuration...OK
[pixy] Checking for invariant sites...Exception: [pixy] ERROR: the provided VCF appears to contain no variable sites. It may have been filtered incorrectly, or otherwise corrupted.

A reproducible example of the bug

Here is my Script (for chr11, one of the three in which Pixy fails) The script works for eight chromosomes.

=======

export REF=Cowpea_Genome_1.0.fasta
export DIR=0.Ref
export SAMPLE=$DIR\_combined_allsites
export CHR=Vu11
gatk --java-options "-Xmx32g" GenotypeGVCFs \
--allow-old-rms-mapping-quality-annotation-data \
-R panSNPs/$DIR/$REF \
-V panSNPs/$DIR/combined.g.vcf \
-all-sites \
-L $CHR \
-O $SAMPLE$CHR.g.vcf

bgzip $SAMPLE$CHR.g.vcf
tabix $SAMPLE$CHR.g.vcf.gz

pixy --stats pi \
     --vcf $SAMPLE$CHR.g.vcf.gz \
     --populations $DIR.txt \
     --bed_file $DIR.cowpeaA.realigned.yes.bed \
     --n_cores 40 \
     --chromosomes $CHR

============ VCF and BED files are quite large. I stored them in this dropbox folder https://www.dropbox.com/sh/m9h22t31yot2tgk/AADcdKTwQ4THa28-cQUaKUJTa?dl=0

OS information Linux H4 4.15.0-177-generic #186-Ubuntu SMP Thu Apr 14 20:23:07 UTC 2022 x86_64 GNU/Linux

ksamuk commented 2 years ago

Hi Stefano,

Thanks for your reproducible example here! This error results from a check for variable sites, and it can sometimes fail when genetic diversity is very low. For speed, this check looks for variable sites only in the first 10000 sites, and throws an error if there aren't any variants in that set. In your case, it looks like there are variable sites, but just not in the first 10000 sites.

In retrospect, this should probably be a warning instead of an error, and I'll mark that for a change in the next version. For now, you can simply add --bypass_invariant_check 'yes' to your pixy command, and that should allow you to proceed.

Hope that helps and all the best,

Kieran

StefanoLonardi commented 2 years ago

Dear Kieran, using --bypass_invariant_check 'yes' worked. Thank you.