ksamuk / pixy

Software for painlessly estimating average nucleotide diversity within and between populations
https://pixy.readthedocs.io/
MIT License
115 stars 14 forks source link

Pandas error: too many columns specified #70

Closed sabrinamostoufi closed 1 year ago

sabrinamostoufi commented 1 year ago

I've been trying to use pixy to estimate pi and dxy for 2 D. melanogaster samples, but I keep running into the same error.

"pandas.errors.ParserError: Too many columns specified: expected 2 and found 1"

I've double and triple-checked the sample names between my VCF and populations.txt files, and checked that there are no extra characters in my populations.txt file. I'm stumped!

The full pixy command I used: pixy --stats pi dxy --vcf Parents_AllSites.vcf.gz --populations populations.txt --window_size 10000

A subset of my VCF, created using GATK:

fileformat=VCFv4.2

ALT=

FILTER=

FORMAT=

FORMAT=

FORMAT=

FORMAT=

FORMAT=

FORMAT=

FORMAT=

FORMAT=

FORMAT=

FORMAT=

[...]

reference=file:///gpfs/projects/singhlab/smostouf/smostouf_WolRecomb/ParentSeqs/dmel-all-chromosome-r6.41.fasta

CHROM POS ID REF ALT QUAL FILTER INFO FORMAT RAL321_SRR8177612 RAL790_SRR8177521

2L 5904 . C A 58.17 . AC=2;AF=1.00;AN=2;DP=2;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=60.00;QD=29.08;SOR=2.303 GT:AD:DP:GQ:PL 1/1:0,2:2:6:84,6,0 ./.:0,0:0:.:0,0,0 2L 5974 . C T 23.19 . AC=2;AF=1.00;AN=2;DP=2;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=60.00;QD=11.60;SOR=0.693 GT:AD:DP:GQ:PL ./.:0,0:0:.:0,0,0 1/1:0,2:2:6:49,6,0

My populations file: RAL321_SRR8177612 ABC RAL790_SRR8177521 DEF

OS information: MacOS Big Sur v11.7.2

ksamuk commented 1 year ago

Hi there, the first thing to check would be to confirm your populations file is tab separated.

sabrinamostoufi commented 1 year ago

Thank you, I have it running now! The text editor I was using was inserting spaces when I used the Tab button, so that was causing the error.