Closed jinlianch closed 7 years ago
Hi @jinlianch, this should work and is a bug. Are you able to share a file with which I can reproduce the issue?
Thanks!
part-00000-aac1e753-02f7-447e-bbda-d80626611b39.snappy.parquet.zip
Test code:
import parquet with open('part-00000-aac1e753-02f7-447e-bbda-d80626611b39.snappy.parquet', 'r') as fo: for r in parquet.DictReader(fo): print (json.dumps(r))
I test other file it work, don't know if the file has something wrong, but the file works on spark
I'm not able to reproduce this with the latest release. Please let me know if you're still able to reproduce.
When I try to read data from a parquet file which contains null value for some key, I got below error.
(most recent call last): File "tt.py", line 12, in
for r in parquet.DictReader(fo):
File "/usr/local/lib/python2.7/site-packages/parquet/init.py", line 420, in DictReader
for row in reader(fo, columns):
File "/usr/local/lib/python2.7/site-packages/parquet/init.py", line 467, in reader
dict_items)
File "/usr/local/lib/python2.7/site-packages/parquet/init.py", line 380, in read_data_page
dict_values_io_obj, bit_width, len(dict_values_bytes))
File "/usr/local/lib/python2.7/site-packages/parquet/encoding.py", line 227, in read_rle_bit_packed_hybrid
res += read_bitpacked(io_obj, header, width, debug_logging)
File "/usr/local/lib/python2.7/site-packages/parquet/encoding.py", line 146, in read_bitpacked
b = raw_bytes[current_byte]
IndexError: list index out of range