Closed JesseGarcia562 closed 2 years ago
Hi Jesse,
The logical flow of all those checks is confusing but I think this is working as intended: without a bed file (whether or not you have a sites file), you'll need to specify a window size. Otherwise, there will be no way for pixy to know the intervals over which to calculate your summary stats. So, if you want a window size of 1, you should specify --window_size 1 (as you did!).
So then on to the next problem, your pandas error. I can't reproduce that on my end, can you post your chr4_gene_locations.txt? That error might also be from your populations.txt file. Have a look at those two files, make sure they are valid tab-separated files etc. (or post them here). You could also try rerunning with the --debug flag to get a traceback of the pandas error.
Let me know how it goes!
Hi,
I checked the populations file and sites_file and verified that they were tab delimited. I'm including them here at this link: https://drive.google.com/drive/folders/1ex8zMsylNyIfuIHuX3Uuq0ORUnxI8Oj7?usp=sharing . Thanks for your help!
Hi Jesse,
Thanks for sending me your data! Interestingly, I wasn't able to reproduce your error. The calculations were slow (single-site mode is still very slow), but they did complete (let me know if you'd like the output file). While I was at it, I added some new optimizations that will speed this type of analysis up in the future.
Re: your error, a few questions:
Just following up here, once your input file issue is resolved, it would probably be worth updating to the new version 1.2.5.beta1 on conda. The single sites + sites file combination you are doing is much faster in the new version.
Updating my pixy to the latest on conda seemed to fix everything! I can now use the sites_file argument.
Describe the bug A clear and concise description of what the bug is. I'm trying to use this program with the "--sites_file" argument but I keep getting the error: "Exception: [pixy] ERROR: In the absence of a BED file, a --window_size must be specified." I think this has to do with the error checking in the code in the lines 659-662 of pixy/core.py. I have no --bed_file in my command (like the tutorial for sites_file suggests) and when I try setting window_size 1 it gives me "pandas.errors.EmptyDataError: No columns to parse from file". Without a bed_file I think the code goes straight into checking "if args.window_size is None:" when I think it needs to allow for a sites_file argument
A reproducible example of the bug Please include the following so we can debug the issue: (1) The full command you used to run pixy, including all arguments pixy --stats pi \ --vcf 2018wgs3.ef.rmIndelRepeatsStar.chr4.vcf.gz \ --populations populations.txt \ --sites_file chr4_gene_locations.txt
I can email you a google drive link with my vcf/populations/sites file if needed. OS information I'm using Mac OS X