jcrobak / parquet-python

python implementation of the parquet columnar file format.
Apache License 2.0
340 stars 257 forks source link

'snappy' is not defined #80

Closed Prussian1870 closed 3 years ago

Prussian1870 commented 3 years ago

After installing parquet-1.3.1 on windows 10. The following code was run:

with open("myparquetfile.parquet","rb") as fo: for row in parquet.reader(fo, columns=['tconst', 'nconst']): print(",".join([str(r) for r in row])

This errored, stack trace below:

Traceback (most recent call last): File "c:\working\test.py", line 25, in for row in parquet.reader(fo, columns=['tconst', 'nconst']): File "C:\Python\Python38\lib\site-packages\parquet__init__.py", line 464, in reader values = read_data_page(file_obj, schema_helper, page_header, cmd, File "C:\Python\Python38\lib\site-packages\parquet__init.py", line 283, in read_data_page raw_bytes = _read_page(file_obj, page_header, column_metadata) File "C:\Python\Python38\lib\site-packages\parquet\init__.py", line 229, in _read_page raw_bytes = snappy.decompress(bytes_from_file) NameError: name 'snappy' is not defined

Thx

jcrobak commented 3 years ago

Hi, snappy is listed as an optional dependency, but if your file is compressed with snappy then you must install parquet-python[snappy]. See https://github.com/jcrobak/parquet-python#requirements