superbobry / snpy

A wrapper-library for reading openSNP data
Do What The F*ck You Want To Public License
50 stars 7 forks source link

VCF Files #2

Open Tuisto59 opened 5 years ago

Tuisto59 commented 5 years ago

HI ,

I downloaded the last version of the bulk of openSNP. It's actually 4700 files of raw data from various companies. I tested your library with all the openSNP file and they're is some file with I tested them with python 3.7.

user6020_file4548_yearofbirth_unknown_sex_unknown.ancestry.txt

I also installed pyVCF parser

Here the Traceback outputed by python using a file of one user:


user6020_file4548_yearofbirth_unknown_sex_unknown.ancestry.txt

Traceback (most recent call last):
  File "/home/yoan/Bureau/ADN/openSNP_rawdata/opensnp_datadump.current/parser.py", line 13, in <module>
    for i in snps:
  File "/usr/local/lib/python3.7/dist-packages/sn.py", line 78, in _23andme_ancestry
    for row in handle:
  File "/usr/lib/python3.7/csv.py", line 112, in __next__
    row = next(self.reader)
  File "/usr/lib/python3.7/codecs.py", line 322, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xbb in position 12: invalid start byte

the code:

import sn
import os

folder_content = os.listdir(os.getcwd())
for i in folder_content:
    print(i)
    if os.path.isfile(i):
        #try:
        snps = sn.parse(i)
        cpt = 0
        for i in snps:
            print(i)
            cpt += 1
            if cpt == 10:
                break

Thanks in advance for our help !

Jasper51297 commented 4 years ago

I just ran into the same problem. The issue is that some of the files are compressed. You can use zcat to decompress them.