jcrobak / parquet-python

python implementation of the parquet columnar file format.
Apache License 2.0
340 stars 257 forks source link

ValueError ordinal must be >= 1 #79

Open VolodyaCO opened 3 years ago

VolodyaCO commented 3 years ago

I'm trying to use parquet.reader(file_obj), but when I do on my parquet I find this error:

    for row in parquet.reader(fo):
  File "/home/vladimir/.local/share/virtualenvs/ComisionDeLaVerdad-FivqEOe7/lib/python3.7/site-packages/parquet/__init__.py", line 472, in reader
    dict_items = _read_dictionary_page(file_obj, schema_helper, page_header, cmd)
  File "/home/vladimir/.local/share/virtualenvs/ComisionDeLaVerdad-FivqEOe7/lib/python3.7/site-packages/parquet/__init__.py", line 395, in _read_dictionary_page
    return convert_column(values, schema_element) if schema_element.converted_type is not None else values
  File "/home/vladimir/.local/share/virtualenvs/ComisionDeLaVerdad-FivqEOe7/lib/python3.7/site-packages/parquet/converted_types.py", line 68, in convert_column
    return [datetime.date.fromordinal(d) for d in data]
  File "/home/vladimir/.local/share/virtualenvs/ComisionDeLaVerdad-FivqEOe7/lib/python3.7/site-packages/parquet/converted_types.py", line 68, in <listcomp>
    return [datetime.date.fromordinal(d) for d in data]

What can I do?

jcrobak commented 3 years ago

Hi, did you open the file in binary mode? We recently updated the example in the readme https://github.com/jcrobak/parquet-python#example

VolodyaCO commented 3 years ago

The error remains:

>>> import parquet
>>> with open("victimas_union_recat.parquet", "rb") as fo:
...   for row in parquet.reader(fo):
...     pass
... 
Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "/home/vladimir/.local/share/virtualenvs/ComisionDeLaVerdad-FivqEOe7/lib/python3.7/site-packages/parquet/__init__.py", line 472, in reader
    dict_items = _read_dictionary_page(file_obj, schema_helper, page_header, cmd)
  File "/home/vladimir/.local/share/virtualenvs/ComisionDeLaVerdad-FivqEOe7/lib/python3.7/site-packages/parquet/__init__.py", line 395, in _read_dictionary_page
    return convert_column(values, schema_element) if schema_element.converted_type is not None else values
  File "/home/vladimir/.local/share/virtualenvs/ComisionDeLaVerdad-FivqEOe7/lib/python3.7/site-packages/parquet/converted_types.py", line 68, in convert_column
    return [datetime.date.fromordinal(d) for d in data]
  File "/home/vladimir/.local/share/virtualenvs/ComisionDeLaVerdad-FivqEOe7/lib/python3.7/site-packages/parquet/converted_types.py", line 68, in <listcomp>
    return [datetime.date.fromordinal(d) for d in data]
ValueError: ordinal must be >= 1

I finally used pyarrow (as recommended by the pandas.read_parquet method)