rapidsai / cudf

cuDF - GPU DataFrame Library
https://docs.rapids.ai/api/cudf/stable/
Apache License 2.0
7.99k stars 867 forks source link

[BUG] Rolling count aggregations produce different results than Pandas 1.0+ #5580

Open brandon-b-miller opened 4 years ago

brandon-b-miller commented 4 years ago

Describe the bug Count aggregations on rolling windows do not match Pandas behavior when using pandas 1.0+. This is due to changes to handling of NaNs and the conditions under which a non-NaN value is allowed to be produced for a particular data window. More discussion in https://github.com/rapidsai/cudf/pull/4546

Steps/Code to reproduce bug

>>> cudf.Series.from_pandas(pd.Series([1,1,1,None])).rolling(2, min_periods=2, center=True).count()
0    null
1       2
2       2
3    null
dtype: int32

>>> pd.__version__
'1.0.3'
>>> pd.Series([1,1,1,None]).rolling(2, min_periods=2, center=True).count()
0    NaN
1    2.0
2    2.0
3    1.0
dtype: float64

Expected behavior Either matching behavior or an understanding of why we differ.

Environment overview (please complete the following information)

Environment details cuDF 0.15

Additional context https://github.com/pandas-dev/pandas/pull/30923 https://github.com/pandas-dev/pandas/issues/34466

jrhemstad commented 4 years ago

So the issue is that cuDF produces null for the last value whereas Pandas 1.0 produces 1?

brandon-b-miller commented 4 years ago

Yes, that's more or less the problem.

github-actions[bot] commented 3 years ago

This issue has been labeled inactive-90d due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.