Closed djeffares closed 3 years ago
Hi there! It is not immediately clear what caused that error. Would it be possible to send me a subset (~10000 lines) of your VCF, and I'll see if I can diagnose the issue on my end?
Hi Kieran
Thanks so much for getting back to me. I tried with some vcfs made using bcftools this time as described in the docs:
bcftools mpileup -f hg38_pvivP01.fa -b bamlist --regions PvP01_01_v1:10000-20000 | \ bcftools call -m -Oz -f GQ -o PvP01_01_v1.10000-20000.bcftools.vcf.gz --output-type z &
And another with ploidy 1 (as my species is haploid). vcfs attached.
Here are the files. This time I get a more informative error:
[pixy] pixy 1.0.4.beta1 [pixy] See documentation at https://pixy.readthedocs.io/en/latest/ /Users/ucbtdje/opt/anaconda3/lib/python3.8/site-packages/allel/io/vcf_read.py:1732: UserWarning: invalid INFO header: '##INFO=<ID=VDB,Number=1,Type=Float,Description="Variant Distance Bias for filtering splice-site artefacts in RNA-seq data (bigger is better)",Version="3">\n' warnings.warn('invalid INFO header: %r' % header)
[pixy] Validating VCF and input parameters... [pixy] Checking write access...OK [pixy] Checking CPU configuration...OK [pixy] Checking for invariant sites...OK [pixy] Checking chromosome data...OK [pixy] Checking intervals/sites...OK [pixy] Checking sample data...OK [pixy] All initial checks past!
[pixy] Preparing for calculation of summary statistics: pi, fst, dxy [pixy] Data set contains 3 population(s), 1 chromosome(s), and 90 sample(s) [pixy] Window size: 1000 bp
[pixy] Started calculations at 11:44:51 on 2021-05-08 [pixy] Using 1 out of 8 available CPU cores
[pixy] Processing chromosome/contig PvP01_01_v1... [pixy] Calculating statistics for region PvP01_01_v1:1-20000... /Users/ucbtdje/opt/anaconda3/lib/python3.8/site-packages/allel/io/vcf_read.py:1732: UserWarning: invalid INFO header: '##INFO=<ID=VDB,Number=1,Type=Float,Description="Variant Distance Bias for filtering splice-site artefacts in RNA-seq data (bigger is better)",Version="3">\n' warnings.warn('invalid INFO header: %r' % header) KeyError: nan
best wishes /Dan
On 7 May 2021, at 23:51, Kieran Samuk @.***> wrote:
Hi there! It is not immediately clear what caused that error. Would it be possible to send me a subset (~10000 lines) of your VCF, and I'll see if I can diagnose the issue on my end?
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/ksamuk/pixy/issues/31#issuecomment-834836447, or unsubscribe https://github.com/notifications/unsubscribe-auth/AD2HHSM7ZJ3O2HVA5HJMDRLTMRVFXANCNFSM44KUDMQA.
Hi Dan, I am not seeing your attachments, can you provide a dropbox/drive link or email them to me at ksamuk@gmail.com? I am guessing the problem has something to do with how haploid genotypes are represented.
Thanks - I shared the vcfs and indexes.
best wishes /Dan
On 8 May 2021, at 15:39, Kieran Samuk @.***> wrote:
Hi Dan, I am not seeing your attachments, can you provide a dropbox/drive link or email them to me at @. @.>? I am guessing the problem has something to do with how haploid genotypes are represented.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/ksamuk/pixy/issues/31#issuecomment-835387560, or unsubscribe https://github.com/notifications/unsubscribe-auth/AD2HHSPTO5FNRSF5CQTB3CDTMVEJ5ANCNFSM44KUDMQA.
Hi Dan,
Just ran the sample data you sent me, and I haven't been able to reproduce the error. Can you confirm that you get the error with the sample data? If not, can you send me some data that does produce the error?
Edit: Might need the populations file also, thanks!
Hi Kieran
I just ran pixy with these two files (one haploid encoded, one diploid, same data). These are the ones I shared (links and commands below). Still the same error. Could it be my installation, or my mac? If so, I’m happy to try on my universities linux server (once they do the install).
What do you think? I don’t want to waste your time unnecessarily.
Files: PvP01_01_v1.10000-20000.bcftools.haploid.vcf.gz PvP01_01_v1.10000-20000.bcftools.vcf.gz
Links: https://drive.google.com/file/d/1Jc4u4OZIqvP7N1gfZVSMq8GVYRWl8GP3/view?usp=sharing https://drive.google.com/file/d/1LsyWlPo9lwfMuvzPRRQvUWbAuZo0S5Wf/view?usp=sharing https://drive.google.com/file/d/1ZTGWgSf0VQmmdy8qGCfom8OjQnLv8g5s/view?usp=sharing https://drive.google.com/file/d/1sjWMLpTwjJcT0OD3bao3IMhbkmBBXdKe/view?usp=sharing
pixy --stats pi fst dxy \ --vcf PvP01_01_v1.10000-20000.bcftools.vcf.gz \ --populations popfile2.txt \ --window_size 1000 \ --output_prefix Pv01_01_v1.10000-20000
pixy --stats pi fst dxy \ --vcf PvP01_01_v1.10000-20000.bcftools.haploid.vcf.gz \ --populations popfile2.txt \ --window_size 1000 \ --output_prefix Pv01_01_v1.10000-20000.haploid
best wishes /Dan
On 9 May 2021, at 17:45, Kieran Samuk @.***> wrote:
Hi Dan,
Just ran the sample data you sent me, and I haven't been able to reproduce the error. Can you confirm that you get the error with the sample data? If not, can you send me some data that does produce the error?
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.
Hi Dan,
Very mysterious! I will see if I can get the error on my Mac today. Can you possibly share your populations file?
Hi Kieran
Here is my populations file. Our IT support is about to install pixy on a server, it may be that it works there.
best wishes /Dan
On 10 May 2021, at 15:17, Kieran Samuk @.***> wrote:
Hi Dan,
Very mysterious! I will see if I can get the error on my Mac today. Can you possibly share your populations file?
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/ksamuk/pixy/issues/31#issuecomment-836756168, or unsubscribe https://github.com/notifications/unsubscribe-auth/AD2HHSMUSOK2A47CDPZR2P3TM7THTANCNFSM44KUDMQA.
M_17_5636 1 M_17_5637 1 M_17_5638 1 M_17_5639 1 M_18_0041 1 M_18_0042 1 M_18_0043 1 M_18_0044 1 M_18_0045 1 M_18_0046 1 M_18_0047 1 M_18_0048 1 M_18_0049 1 M_18_0050 1 M_18_0051 1 M_18_0052 1 M_18_0053 1 M_18_0054 1 M_18_0055 1 M_18_0056 1 M_18_0057 1 SRR1562517 2 SRR1562518 2 SRR1562520 2 SRR1562526 2 SRR1562567 2 SRR1562605 2 SRR1562613 2 SRR1562614 2 SRR1562624 2 SRR1562844 2 SRR1562851 2 SRR1562939 2 SRR1562958 2 SRR1562973 2 SRR1562974 2 SRR1564529 2 SRR1564547 2 SRR1564577 2 SRR1564665 2 SRR1567977 2 SRR1568103 2 SRR1568105 2 SRR1568106 2 SRR1568114 2 SRR1568115 2 SRR1568126 2 SRR1568128 2 SRR1568149 2 SRR1568150 2 SRR1568162 2 SRR1568169 2 SRR1568173 2 SRR1568177 2 SRR1568178 2 SRR1568180 2 SRR1568192 2 SRR1568193 2 SRR1568201 2 SRR1568204 2 SRR1568216 2 SRR1568218 2 SRR1738015 2 SRR1738039 2 SRR1738040 2 SRR1740119 2 SRR1740379 2 SRR1740392 2 SRR1740488 2 SRR1759047 2 SRR1759307 2 SRR1759523 2 SRR1759592 2 SRR332408 2 SRR332409 2 SRR332410 2 SRR332559 2 SRR332560 2 SRR332561 2 SRR332563 2 SRR332570 2 SRR340133 2 SRR340134 2 SRR570031 2 SRR572650 2 SRR572651 2 SRR575087 2 SRR575089 2 SRR828416 2 Transfer.PvP01_00_99.final
Hi Dan, is the "Transfer.PvP01_00_99.final" line in your populations file? If so, it doesn't have a population field. Can you try running without that line (or assign it to a population)?
Hi Kieran
That was it! Works now with both haploid and diploid vcfs, made with either bcftools or freebayes.
btw: the data is from this paper & reviews wanted dxy (rather than Fst) and were concerned about coverage, so pixy is perfect. thanks!
https://www.biorxiv.org/content/10.1101/841171v3 https://www.biorxiv.org/content/10.1101/841171v3
best wishes /Dan
On 10 May 2021, at 15:37, Kieran Samuk @.***> wrote:
Hi Dan, is the "Transfer.PvP01_00_99.final" line in your populations file? If so, it doesn't have a population field. Can you try running without that line (or assign it to a population)?
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/ksamuk/pixy/issues/31#issuecomment-836778950, or unsubscribe https://github.com/notifications/unsubscribe-auth/AD2HHSJU5F762SLZUBR4LCTTM7VR7ANCNFSM44KUDMQA.
Awesome! I'll work on adding a more informative error message to catch that in the future. Thanks for your help tracking this down.
Dear Kieran Samuk
Thanks for making pixy, it is a very valuable tool.
I am having an error with running pixy that I can't diagnose. I would very much appreciate some guidance. At the moment pixy starts to calculate stats, then fails with: KeyError: nan. The only outputs file (pixy_tmpfile_8ba3a11fae0041409dabd3d62ea4f743.tmp) is blank. All the details are below.
best wishes, Daniel Jeffares
I am using an all sites bgzipped, tabix-indexed vcf created with freebayes: freebayes -f hg38_pvivP01.fa bams/*bam \ --min-mapping-quality 30 \ --min-coverage 10 \ --strict-vcf \ --gvcf \ --report-monomorphic \ --vcf PvP01_01_v1.vcf &
I am running pixy on a mac. My command is: pixy --stats pi fst dxy \ --vcf PvP01_01_v1.vcf.gz \ --populations popfile2.txt \ --window_size 10000 \ --output_prefix pixytest
The error code I get is: [pixy] pixy 1.0.4.beta1 [pixy] See documentation at https://pixy.readthedocs.io/en/latest/
[pixy] Validating VCF and input parameters... [pixy] Checking write access...OK [pixy] Checking CPU configuration...OK [pixy] Checking for invariant sites...OK [pixy] Checking chromosome data...OK [pixy] Checking intervals/sites...OK [pixy] Checking sample data...OK [pixy] All initial checks past!
[pixy] Preparing for calculation of summary statistics: pi, fst, dxy [pixy] Data set contains 3 population(s), 1 chromosome(s), and 90 sample(s) [pixy] Window size: 10000 bp
[pixy] Started calculations at 18:10:55 on 2021-05-07 [pixy] Using 1 out of 8 available CPU cores
[pixy] Processing chromosome/contig PvP01_01_v1... [pixy] Calculating statistics for region PvP01_01_v1:1-156579... KeyError: nan