Open rwijtvliet opened 2 years ago
Had a look on latest versions, it looks like it's only when appending to a series it's an issue; not for overwriting values. Might be worth raising an issue in pandas-dev?
import pandas as pd
import pint_pandas
import pint
import numpy as np
ureg =pint.get_application_registry()
Q_ = ureg.Quantity
pint_pandas.show_versions()
{'numpy': '1.23.3',
'pandas': '1.5.2',
'pint': '0.20.2.dev16+g01411c7',
'pint_pandas': '0.4.dev40+g2f39497.d20221212'}
# Appending a single value to the series fails
s = pd.Series([70, 60, 50], dtype="pint[W]")
# uncomment each line in turn
# s.loc[8] = None # ValueError: fill_value must be a scalar
# s.loc[8] = np.nan # ValueError: fill_value must be a scalar
# s.loc[8] = Q_(np.nan,"W") # converts series to object dtype
# s.loc[8] = 1 # works, maintains pint[watt] dtype
# s.loc[8] = Q_(1, "W") # converts series to object dtype
s
# Appending a list of values KeyErrors:
# s.loc[[8,9]] = Q_(1, "W") # KeyError: "None of [Int64Index([8, 9], dtype='int64')] are in the [index]"
# Setting a list of values works:
s = pd.Series([70, 60, 50], dtype="pint[W]")
# uncomment each line in turn
# s.loc[[0,1]] = None # Works, maintains pint[watt] dtype
# s.loc[[0,1]] = np.nan # Works, maintains pint[watt] dtype
# s.loc[[0,1]] = Q_(np.nan,"W") # Works, maintains pint[watt] dtype
# s.loc[[0,1]] = 1 # works, maintains pint[watt] dtype
s.loc[[0,1]] = Q_(1, "W") # works, maintains pint[watt] dtype
s
# DataFrame still works
df = pd.DataFrame(
{
"a": pd.Series([70, 60, 50], dtype="pint[W]"),
"b": pd.Series([0.5, 0.4, 0.2], dtype="pint[s]"),
}
)
# df.loc[8,:] = None # Works, maintains dtypes
# df.loc[8,:] = np.nan # Works, maintains dtypes
# df.loc[8,:] = Q_(np.nan,"W") # Works, dimensionality error
# df.loc[8,:] = 1 # works, maintains pint[watt] dtype
# df.loc[8,:] = Q_(1, "W") # Works, dimensionality error
print(df.dtypes)
df
This pandas issue seems related: https://github.com/pandas-dev/pandas/issues/24246
The _maybe_promote logic currently tripping things up looks like this:
def _maybe_promote(dtype: np.dtype, fill_value=np.nan):
# The actual implementation of the function, use `maybe_promote` above for
# a cached version.
if not is_scalar(fill_value):
# with object dtype there is nothing to promote, and the user can
# pass pretty much any weird fill_value they like
if not is_object_dtype(dtype):
# with object dtype there is nothing to promote, and the user can
# pass pretty much any weird fill_value they like
raise ValueError("fill_value must be a scalar")
dtype = _dtype_obj
return dtype, fill_value
What's missing is anis_extension_array_dtype
clause between the two that can do something sane when we need to promote an NA value to a Quantity.
Using
.loc
to add values to aSeries
does not retain its pint dtype. ForDataFrame
, the dtypes are retained.Here is a minimal working example:
Here are my versions:
{'numpy': '1.22.3', 'pandas': '1.4.1', 'pint': '0.18', 'pint_pandas': '0.2'}