ksamuk / pixy

Software for painlessly estimating average nucleotide diversity within and between populations
https://pixy.readthedocs.io/
MIT License
115 stars 14 forks source link

Pixy can't handle simple VCFs (PLINK output) #110

Closed somnya closed 1 month ago

somnya commented 3 months ago

Dear Kieran,

Thank you for this wonderful tool! I have been using Pixy with standard, well-annotated VCFs, and everything goes correctly. Nevertheless, when I process my VCFs with PLINK, the Pixy outputs do not detect any genotype:

pixy] WARNING: pixy failed to find any valid gentoype data to calculate the following summary statistics: fst. No output file will be created for these statistics.

The rest of the stats say no SNPs are in the windows. My VCF looks like this: (This VCF has only variant sites on purpose) image

I was wondering if there may be a way to work with simple VCFs? Thank you!

ksamuk commented 3 months ago

Hi there, I think this related to the annotation in the ID field in your VCF. Can you try removing the IDs and rerunning? You can use bcftools, something like bcftools annotate -x ID -Ov your_vcf.vcf.gz > your_vcf_no_ids.vcf.

somnya commented 2 months ago

Dear Kieran,

Thank you so much for your suggestion! I noticed that PIXY only processes sites with depth>10. So I added a fake depth value of 10 to all my samples and sites, and PIXY could run without issues! I think it would be nice to add in the documentation that depth information is required in all input VCFs (Maybe I missed it) Thank you a lot for your contribution of creating this software!

Eddy.