hgrecco / pint-pandas

Pandas support for pint
Other
166 stars 41 forks source link

"ValueError: fill_value must be a scalar" when concatenating columns requiring fill #165

Closed burnpanck closed 1 year ago

burnpanck commented 1 year ago

The following MWE shows the issue:

import pandas as pd
import pint_pandas

a = pd.Series(index=[1,2],data=pint_pandas.PintArray([0.,1.], u.mm))
b = pd.Series(index=[2,3],data=pint_pandas.PintArray([2.,3.], u.s))

pd.concat([a,b],axis="columns")

With "normal" pint columns, the code works as expected. I partially traced the error down to https://github.com/hgrecco/pint-pandas/blob/c58a7fcf9123eb65f5e78845077b205e20279b9d/pint_pandas/pint_array.py#L475 which seems to create a 0-d numpy array instead of a numpy scalar.

This may be an impact of a recent change in either pint_pandas or numpy; I already observed that my int(round(some_quantity/other_quantity_of_same_dim)) started to fail and now require a int(round(float(...))) construct.

The relevant versions in my environment are:

numpy: 1.23.5
pandas: 1.5.1
pint: 0.20.1
pint_pandas: 0.3
burnpanck commented 1 year ago

Actually, it may depend on the registry... It seems that both are ultimately an effect of ureg.Quantity(float(...)) always returning arrays - there may be a pint registry option that enforces this? I recently started using pint_xarray, and at some point some library (not really sure if it was pint_array) snapped at me about such a setting, but it was in an interactive session with atypical import order, and after a re-load the issue disappeared.

burnpanck commented 1 year ago

a.reindex(b.index) also fails with the same error, while a.reindex(b.index, fill_value=np.nan) works.

burnpanck commented 1 year ago

Indeed, it is pint-xarray which forces pint.application_registry.force_ndarray_like = True, thus triggering this bug here.