Open glaucouri opened 4 days ago
I believe the code you are complaining about is in these lines (but it's worth double-checking): https://github.com/pandas-dev/pandas/blob/bef88efe999809b775ed88a02f0fc2fd6d2d08a2/pandas/core/_numba/kernels/min_max_.py#L106 https://github.com/pandas-dev/pandas/blob/bef88efe999809b775ed88a02f0fc2fd6d2d08a2/pandas/core/_numba/kernels/min_max_.py#L49-L52
Let me explain how I think your issue report could be modified and why.
I misunderstood your suggestion initially. You indeed insist on treating np.nan
as an invalid value consistently in aggregation functions. I personally care more about consistency, so here is another example of the supposed bug:
s = pd.Series([np.nan, 0, 1], dtype="Float64")
(s / s).max() # <NA>
(s / s).groupby([9, 9, 9]).max().iat[0] # 1.0
The last two lines were expected to give the same result (whatever it should be).
Pandas version checks
[X] I have checked that this issue has not already been reported.
[X] I have confirmed this bug exists on the latest version of pandas.
[X] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
Issue Description
DataFrameGroupBy.agg handles poorly nan.
Unfortunately, sometimes happens that some nullable fields have some nan. cfr: https://github.com/pandas-dev/pandas/issues/32265
And this case falls into unexpected behavior in conjunction with groupby.
In a nutshell:
Having nan into a Float field make the groupby()[min/max] computation wrong
Expected Behavior
From my perspective, a nan must generate other nan, an aggregation of nan, must again generate nan
semantically: "An invalid value, cannot be computed, so a transformation of it should result again into an invalid value"
an aggregation (via groupby) of nan, should result into nan
Installed Versions