[BUG] After replace [-np.inf, np.inf] with np.nan, group forward fill not working.

rapidsai / cudf

cuDF - GPU DataFrame Library

Apache License 2.0

8.31k stars 886 forks source link

import cudf import numpy as np data = { 'group': ['A', 'A', 'A', 'B', 'B', 'B'], 'value': [1, -np.inf, 3, np.inf, 5, np.inf] } df = cudf.DataFrame(data) print("Original DataFrame:") print(df) df['value'] = df['value'].replace([-np.inf, np.inf], np.nan) df['value'] = df.groupby('group')['value'].ffill() print("\nDataFrame after forward fill:") print(df)

The problem appears already after the replace call:

import cudf
import numpy as np

s = cudf.Series([1, -np.inf, np.inf])

print(s.replace([-np.inf, np.inf], np.nan))

print(s.replace(-np.inf, np.nan).replace(np.inf, np.nan))

The former produces:

0    1.0
1    NaN
2    NaN
dtype: float64

The latter:

0     1.0
1    <NA>
2    <NA>
dtype: float64

groupby.ffill handles the latter case, but not the former, in the way you might expect from pandas (where NaN is consider a missing value).

I agree that replace should produce the same output for the two examples in this comment (I think the latter is "more correct").

To work around this, if you replace your usage of np.nan in your replace call with None, then everything works as anticipated.

Note that this is a consequence of cudf being slightly stricter than pandas in a number of places when it comes to differences between nan and NA, the latter indicates and actually missing value, the former (in cudf) does not.

rapidsai / cudf

[BUG] After replace [-np.inf, np.inf] with np.nan, group forward fill not working. #16136