hgrecco / pint-pandas

Pandas support for pint
Other
172 stars 42 forks source link

fillna does not work for dataframes filled with heterogeneous Quantity elements #129

Closed MichaelTiemannOSC closed 1 year ago

MichaelTiemannOSC commented 2 years ago

When a series of floats contains a NaN, fillna behaves thusly:

import pandas as pd
s = pd.Series([1, None, 3], dtype=float)
print(s)
# 0    1.0
# 1    nan
# 2    3.0
# dtype: float64

print(s.fillna(-9.0))
# 0     1.0
# 1    -9.0
# 2     3.0
# dtype: float64

When s is a PintArray, it also works the same way: NaN values are replaced with fillna values:

s = pd.Series(PA_([1., None, 3.], dtype='pint[m]'), dtype='pint[m]')
print(s)
# 0    1.0
# 1    nan
# 2    3.0
# dtype: pint[meter]

print(s.fillna(-9.0))
# 0     1.0
# 1    -9.0
# 2     3.0
# dtype: pint[meter]

However, if we create a heterogeneous Data Frame, fillna does not work as expected:

df = pd.DataFrame([[ureg('m'), Q_(np.nan, 'l'), ureg('s')]])
print(df)
#          0          1         2
# 0  1 meter  nan liter  1 second

print(df.fillna(-9.0))
#          0          1         2
# 0  1 meter  nan liter  1 second

It would be nice if we could all agree that NaN is NaN as far as fillna is concerned. I'd be happy to see an error message that one cannot fill a nan liter with a -9.0 meter, but only a -9.0 liter, but it looks like fillna won't accept a Quantity as a value:

print(df.fillna(Q_(-9.0, 'l')))
*** ValueError: invalid fill value with a <class 'pint.quantity.build_quantity_class.<locals>.Quantity'>

What I really want to be able to do with my Quantity-filled dataframes is to be able to use this idiomdf1.fillna(df2). In my case, df2 always has the correct units to replace a NaN in df1.

andrewgsavage commented 1 year ago

your third example uses object dtype not PintType. If you specify the columns astype('pint[m]') first it'll behave as you want.