hgrecco / pint-pandas

Pandas support for pint
Other
166 stars 41 forks source link

cannot display pint-pandas DataFrame in Streamlit #181

Open mkaut opened 1 year ago

mkaut commented 1 year ago

I tried using pint-pandas-enhanced DataFrames in a Streamlit app, but displaying it leads to an error from pyarrow.

My code:

import streamlit as st
import pandas as pd
import pint
import pint_pandas
df = pd.DataFrame({
    "torque": pd.Series([1., 2., 2., 3.], dtype="pint[lbf ft]"),
    "angular_velocity": pd.Series([1., 2., 2., 3.], dtype="pint[rpm]"),
})
df.dtypes
df

The penultimate line display the dtypes and confirms that the dataframe is created correctly, but the last line fails with the following error message in the app:

ArrowTypeError: ('Did not pass numpy.dtype object', 'Conversion failed for column torque with type pint[foot * force_pound]')

and in the terminal:

2023-05-23 10:57:26.467 Serialization of dataframe to Arrow table was unsuccessful due to: ('Could not convert pint[foot * force_pound] with type PintType: did not recognize Python value type when inferring an Arrow data type', 'Conversion failed for column 0 with type object'). Applying automatic fixes for column types to make the dataframe Arrow-compatible.
2023-05-23 10:57:27.067 Serialization of dataframe to Arrow table was unsuccessful due to: ('Did not pass numpy.dtype object', 'Conversion failed for column torque with type pint[foot * force_pound]'). Applying automatic fixes for column types to make the dataframe Arrow-compatible.
2023-05-23 10:57:27.067 Uncaught app exception
Traceback (most recent call last):
  File "[...]\.py311\Lib\site-packages\streamlit\type_util.py", line 757, in data_frame_to_bytes
    table = pa.Table.from_pandas(df)
            ^^^^^^^^^^^^^^^^^^^^^^^^
  File "pyarrow\table.pxi", line 3681, in pyarrow.lib.Table.from_pandas
  File "[...]\.py311\Lib\site-packages\pyarrow\pandas_compat.py", line 611, in dataframe_to_arrays
    arrays = [convert_column(c, f)
             ^^^^^^^^^^^^^^^^^^^^^
  File "[...]\.py311\Lib\site-packages\pyarrow\pandas_compat.py", line 611, in <listcomp>
    arrays = [convert_column(c, f)
              ^^^^^^^^^^^^^^^^^^^^
  File "[...]\.py311\Lib\site-packages\pyarrow\pandas_compat.py", line 598, in convert_column
    raise e
  File "[...]\.py311\Lib\site-packages\pyarrow\pandas_compat.py", line 592, in convert_column
    result = pa.array(col, type=type_, from_pandas=True, safe=safe)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "pyarrow\array.pxi", line 323, in pyarrow.lib.array
  File "pyarrow\array.pxi", line 79, in pyarrow.lib._ndarray_to_array
  File "pyarrow\array.pxi", line 67, in pyarrow.lib._ndarray_to_type
  File "pyarrow\error.pxi", line 123, in pyarrow.lib.check_status
pyarrow.lib.ArrowTypeError: ('Did not pass numpy.dtype object', 'Conversion failed for column torque with type pint[foot * force_pound]')

To me, it looks like Streamlit is using Arrow to process the dataframe for display, and Arrow does not recognize/understand the pint types.

Is there some way to fix it, or get around it? And should it be reported to Streamlit developers, or is it pint-pandas' responsibility?

andrewgsavage commented 1 year ago

Is there some way to fix it, or get around it?

Not that I know of

And should it be reported to Streamlit developers, or is it pint-pandas' responsibility?

You can try asking Streamlit or pyarrow and see if anyone is interested. It isn't anyone's respsonsibility.

scanzy commented 1 year ago

Hello @mkaut, I have the same problem :(

As a temporary workaround, I am converting everything to string. It's not ideal, but at least it shows something, with the units!

df = pd.DataFrame({
    "torque": pd.Series([1., 2., 2., 3.], dtype="pint[lbf ft]"),
    "angular_velocity": pd.Series([1., 2., 2., 3.], dtype="pint[rpm]"),
}, dtype = str) # conversion to string here

If you want to make the dataframe editable, maybe it's possible to keep the data as separate pd.Series and update them using st.session_state.my_data_editor.edited_rows.

import pandas as pd
import streamlit as st
import pint

# sets compact unit formatting
u = pint.UnitRegistry()
u.default_format = '~P'
pint.set_application_registry(u)

# initial data
columns = {
    "torque": pd.Series([1., 2., 2., 3.], dtype="pint[lbf ft]"),
    "angular_velocity": pd.Series([1., 2., 2., 3.], dtype="pint[rpm]"),
}

# shows data editor widget, with string data type
st.data_editor(pd.DataFrame(columns, dtype = str), key = "my_data_editor")

# gets edited rows
edited_rows = st.session_state.get("my_data_editor", {}).get("edited_rows", {})
st.write(edited_rows)

# converts edited rows to pint quantities
for rowIndex, editData in edited_rows.items():
    for colIndex, newValue in editData.items():
        columns[colIndex][rowIndex] = u.Quantity(newValue)

# shows edited data
for colName, colSeries in columns.items():
    st.write(colName, colSeries.tolist())

This solutions converts values correctly even using different units. E.g. using rad/s (instead of rpm) in angular_velocity keeps all values in rpm.

p.s. if you opened some issue about this on streamlit and/or arrow, please post the link here.