ksamuk / pixy

Software for painlessly estimating average nucleotide diversity within and between populations
https://pixy.readthedocs.io/
MIT License
115 stars 14 forks source link

UnicodeDecodeError when running pixy for the first time #62

Closed olgakozhar closed 2 years ago

olgakozhar commented 2 years ago

Describe the bug I just installed pixy via conda on Mac Catalina, prepared vcf and popmap files according to the guide and tried to run the program, but am getting the following error: UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte

A reproducible example of the bug 1)The code used:

pixy --stats pi --vcf CA_CO_all_sites_filtered.vcf.recode.vcf.gz.tbi --populations popmap.txt --window_size 10000 --n_cores 2

[pixy] pixy 1.2.7.beta1 [pixy] See documentation at https://pixy.readthedocs.io/en/latest/ UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte

2) Subset of vcf file is attached

3) Popmap file is attached popmap.txt

subset.vcf.zip

OS information MacOS

Any advice on how to resolve this issue is appreciated. Thanks! Olga

ksamuk commented 2 years ago

Hi Olga,

I think that error code from pandas is associated with gzipped data. Can you confirm that your popmap.txt (the one you are actually pointing pixy toward, not the one you sent) is not gzipped? It might named .txt, but still could be gzipped data. Try renaming and unzipping it and rerunning that command.

Thanks,

Kieran

olgakozhar commented 2 years ago

Hi Kieran,

The popmap file is not gzipped. It is the same file I attached to the message. I just created it in a plain text editor, and double checked - it is not compressed.

Olga

ksamuk commented 2 years ago

Hi Olga,

Missed this but it looks like your command is pointing to the index file (.tbi) and not the gzipped vcf (vcf.gz), try switching that.

olgakozhar commented 2 years ago

Hi Kieran,

I sorted it out. It was an issue with tabixed vcf file. All working now.

Thanks! Olga