hgrecco / pint-pandas

Pandas support for pint
Other
169 stars 42 forks source link

Using `.loc` to add values does not retain pint dtype when working with `Series`. #126

Open rwijtvliet opened 2 years ago

rwijtvliet commented 2 years ago

Using .loc to add values to a Series does not retain its pint dtype. For DataFrame, the dtypes are retained.

Here is a minimal working example:

import pandas as pd
import pint_pandas

# Create unit-aware Series and Dataframe
s = pd.Series([70, 60, 50],  dtype="pint[W]")
df = pd.DataFrame(
    {
        "a": pd.Series([70, 60, 50], dtype="pint[W]"),
        "b": pd.Series([0.5, 0.4, 0.2], dtype="pint[s]"),
    }
)

# Append None
s.loc[8] = None
df.loc[8, :] = None

# Issue: s gets the object dtype and the units are included in the individual values
s
# 0    70.0 W
# 1    60.0 W
# 2    50.0 W
# 8      None
# dtype: object

# But df is good, with the columns retaining their pint dtype
df
#       a    b
# 0  70.0  0.5
# 1  60.0  0.4
# 2  50.0  0.2
# 8   nan  nan
df.pint.dequantify()
#          a    b
# unit     W    s
# 0     70.0  0.5
# 1     60.0  0.4
# 2     50.0  0.2
# 8      NaN  NaN

Here are my versions: {'numpy': '1.22.3', 'pandas': '1.4.1', 'pint': '0.18', 'pint_pandas': '0.2'}

andrewgsavage commented 1 year ago

Had a look on latest versions, it looks like it's only when appending to a series it's an issue; not for overwriting values. Might be worth raising an issue in pandas-dev?

import pandas as pd
import pint_pandas
import pint
import numpy as np

ureg =pint.get_application_registry()
Q_ = ureg.Quantity 
pint_pandas.show_versions()

{'numpy': '1.23.3',
 'pandas': '1.5.2',
 'pint': '0.20.2.dev16+g01411c7',
 'pint_pandas': '0.4.dev40+g2f39497.d20221212'}

# Appending a single value to the series fails

s = pd.Series([70, 60, 50],  dtype="pint[W]")
# uncomment each line in turn
# s.loc[8] = None # ValueError: fill_value must be a scalar
# s.loc[8] = np.nan # ValueError: fill_value must be a scalar
# s.loc[8] = Q_(np.nan,"W") # converts series to object dtype
# s.loc[8] = 1 # works, maintains pint[watt] dtype
# s.loc[8] = Q_(1, "W") # converts series to object dtype
s

# Appending a list of values KeyErrors:
# s.loc[[8,9]] = Q_(1, "W") # KeyError: "None of [Int64Index([8, 9], dtype='int64')] are in the [index]"

# Setting a list of values works:
s = pd.Series([70, 60, 50],  dtype="pint[W]")
# uncomment each line in turn
# s.loc[[0,1]] = None # Works, maintains pint[watt] dtype
# s.loc[[0,1]] = np.nan # Works, maintains pint[watt] dtype
# s.loc[[0,1]] = Q_(np.nan,"W") # Works, maintains pint[watt] dtype
# s.loc[[0,1]] = 1 # works, maintains pint[watt] dtype
s.loc[[0,1]] = Q_(1, "W") # works, maintains pint[watt] dtype
s

# DataFrame still works
df = pd.DataFrame(
    {
        "a": pd.Series([70, 60, 50], dtype="pint[W]"),
        "b": pd.Series([0.5, 0.4, 0.2], dtype="pint[s]"),
    }
)
# df.loc[8,:] = None # Works, maintains dtypes
# df.loc[8,:] = np.nan # Works, maintains dtypes
# df.loc[8,:] = Q_(np.nan,"W") # Works, dimensionality error
# df.loc[8,:] = 1 # works, maintains pint[watt] dtype
# df.loc[8,:] = Q_(1, "W") # Works, dimensionality error
print(df.dtypes)
df
MichaelTiemannOSC commented 1 year ago

This pandas issue seems related: https://github.com/pandas-dev/pandas/issues/24246

The _maybe_promote logic currently tripping things up looks like this:

def _maybe_promote(dtype: np.dtype, fill_value=np.nan):
    # The actual implementation of the function, use `maybe_promote` above for                                                                                                                                         
    # a cached version.                                                                                                                                                                                                
    if not is_scalar(fill_value):
        # with object dtype there is nothing to promote, and the user can                                                                                                                                              
        #  pass pretty much any weird fill_value they like                                                                                                                                                             
        if not is_object_dtype(dtype):
            # with object dtype there is nothing to promote, and the user can                                                                                                                                          
            #  pass pretty much any weird fill_value they like                                                                                                                                                         
            raise ValueError("fill_value must be a scalar")
        dtype = _dtype_obj
        return dtype, fill_value

What's missing is anis_extension_array_dtype clause between the two that can do something sane when we need to promote an NA value to a Quantity.