ksamuk / pixy

Software for painlessly estimating average nucleotide diversity within and between populations
https://pixy.readthedocs.io/
MIT License
115 stars 14 forks source link

error says my vcf has no invariant #44

Closed amsamani closed 3 years ago

amsamani commented 3 years ago

Hi,

I got an error multiple times when I tried to run dxy. the error says I do not have any invariants.

command: pixy --stats dxy --vcf birds.vcf.gz --populations test.txt --window_size 5000 --n_cores 24 --output_prefix pixy_dxy_birds I got this error multiple times when I ran the above command.

error: [pixy] Checking for invariant sites...Exception: [pixy] ERROR: the provided VCF appears to contain no invariant sites (ALT = "."). This check can be bypassed via --bypass_invariant_check 'yes'.

I am sure that I have invariants in my vcf file. but I followed and added the bypass option so the command looks like this now: pixy --stats dxy --vcf birds.vcf.gz --populations test.txt --window_size 5000 --n_cores 24 --output_prefix pixy_dxy_birds --bypass_invariant_check 'yes'

now everything works but I was wondering if the analysis would be correct because I got this comment from pixy: pixy] EXTREME WARNING: --bypass_invariant_check is set to 'yes'. Note that a lack of invariant sites will result in incorrect estimates.

Thanks.

ksamuk commented 3 years ago

Hi there, can you post a link to a file (or email to ksamuk@gmail.com) with the first few thousand lines of your VCF? e.g.

zcat birds.vcf.gz | head -n 3000 | gzip -c > birds_subset.vcf.gz

Morriyaty commented 3 years ago

I got the same error. : (

ksamuk commented 3 years ago

@wyj-lzu Thanks for sending me your data -- it looks like the VCF you sent me indeed doesn't contain invariant sites, so your issue might be unrelated to this one? Let me know if I missed something. Our guide for generating invariant sites VCFs is here: https://pixy.readthedocs.io/en/latest/generating_invar/generating_invar.html

amsamani commented 3 years ago

Hi there, can you post a link to a file (or email to ksamuk@gmail.com) with the first few thousand lines of your VCF? e.g.

zcat birds.vcf.gz | head -n 3000 | gzip -c > birds_subset.vcf.gz

thank you very much for your help. I emailed you the vcf file.

ksamuk commented 3 years ago

To keep a record here, this issue was resolved over email, the VCF indeed did not contain any invariant sites and so no bug fixes needed.

Weihankk commented 2 years ago

Hi, I ran into the same error, but after careful inspection I made sure my VCF is well and all sites contain REF and ALT (only A/T/C/G, no other letters). I think this is indeed a bug that needs to be fixed.

ksamuk commented 2 years ago

Hi there,

In the other two cases, this error was correct in identifying VCF that lacked invariant sites. Invariant sites have an ALT allele of "." (not ATCG). Can you confirm that your VCF has invariant sites where the ALT field is "."? If they are indeed there, go ahead send me a subset of your VCF and I can try to further diagnose the issue.

Weihankk commented 2 years ago

Hi there,

In the other two cases, this error was correct in identifying VCF that lacked invariant sites. Invariant sites have an ALT allele of "." (not ATCG). Can you confirm that your VCF has invariant sites where the ALT field is "."? If they are indeed there, go ahead send me a subset of your VCF and I can try to further diagnose the issue.

Hi Samuk, I have send all my files (including vcf, tbi index and population information ) to ksamuk@gmail.com. The vcf file contain all my sites and I run pixy command again and got the same error "ERROR: the provided VCF appears to contain no invariant sites (ALT = ".")". When I set "--bypass_invariant_check 'yes'", pixy runs normally.

My command is: pixy --stats pi --vcf merge.snps.vcf.gz --populations pixy.population.txt --window_size 5000 --n_cores 10 --output_folder ./TEST

Weihan

ksamuk commented 2 years ago

Hi Weihan,

I just checked and your VCF files indeed don't contain any invariant sites. Invariant sites have an ALT allele field (the fifth column) with the value "." (a period/dot). The FILTER fields is also sometimes ".", so that might be one cause of confusion. Have a look at the guide here for tips on creating invariant sites VCF: https://pixy.readthedocs.io/en/latest/generating_invar/generating_invar.html. You'll also need to be sure your invariant sites are retained if you performed any filtering, which can remove all the invariant sites.

Cheers,

Kieran