hgrecco / pint-pandas

Pandas support for pint
Other
166 stars 40 forks source link

Parquet Support #230

Closed geoffviola closed 1 month ago

geoffviola commented 1 month ago

Is parquet serialization supported? I see some mention of pyarrow in the code and CSV in the docs, but I don't see parquet mentioned in the docs.

I tried writing the tutorial df to a parquet file.

import pandas as pd
import pint_pandas
import pyarrow
df = pd.DataFrame({
    "torque": pd.Series([1, 2, 2, 3], dtype="pint[lbf ft]"),
    "angular_velocity": pd.Series([1, 2, 2, 3], dtype="pint[rpm]"),
})
df.to_parquet("/tmp/deleteme.parquet")

But I see this error.

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "~/.local/lib/python3.10/site-packages/pandas/util/_decorators.py", line 333, in wrapper
    return func(*args, **kwargs)
  File "~/.local/lib/python3.10/site-packages/pandas/core/frame.py", line 3113, in to_parquet
    return to_parquet(
  File "~/.local/lib/python3.10/site-packages/pandas/io/parquet.py", line 480, in to_parquet
    impl.write(
  File "~/.local/lib/python3.10/site-packages/pandas/io/parquet.py", line 190, in write
    table = self.api.Table.from_pandas(df, **from_pandas_kwargs)
  File "pyarrow/table.pxi", line 4525, in pyarrow.lib.Table.from_pandas
  File "~/.local/lib/python3.10/site-packages/pyarrow/pandas_compat.py", line 611, in dataframe_to_arrays
    arrays = [convert_column(c, f)
  File "~/.local/lib/python3.10/site-packages/pyarrow/pandas_compat.py", line 611, in <listcomp>
    arrays = [convert_column(c, f)
  File "~/.local/lib/python3.10/site-packages/pyarrow/pandas_compat.py", line 598, in convert_column
    raise e
  File "~/.local/lib/python3.10/site-packages/pyarrow/pandas_compat.py", line 592, in convert_column
    result = pa.array(col, type=type_, from_pandas=True, safe=safe)
  File "pyarrow/array.pxi", line 345, in pyarrow.lib.array
  File "pyarrow/array.pxi", line 81, in pyarrow.lib._ndarray_to_array
  File "pyarrow/array.pxi", line 69, in pyarrow.lib._ndarray_to_type
  File "pyarrow/error.pxi", line 154, in pyarrow.lib.pyarrow_internal_check_status
  File "pyarrow/error.pxi", line 91, in pyarrow.lib.check_status
pyarrow.lib.ArrowTypeError: ('Did not pass numpy.dtype object', 'Conversion failed for column torque with type pint[foot * force_pound]')
I used these versions Package Verions
pyarrow 16.1.0
numpy 1.26.4
pandas 2.2.2
pint-pandas 0.5
pint 0.23
andrewgsavage commented 1 month ago

use df.pint.dequantify and df.pint.quantify as workarounds

geoffviola commented 1 month ago

That works for me. Thanks!