pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
43.62k stars 17.91k forks source link

BUG: rolling window with `center=True, min_periods=1` is not symmetric at edges #59252

Open jack-walp opened 3 months ago

jack-walp commented 3 months ago

Pandas version checks

Reproducible Example

import pandas as pd
import numpy as np
pd.Series(np.arange(100)).rolling(21, center=True, min_periods=1).mean().plot()

Issue Description

The np.arange gives a simple linear trend that should not be affected by the rolling mean filter. However at the edges the mean filter pulls values more towards the centre than expected, causing kinks in the curve. It looks like at the edge of the data Null values creep into the window and these are ignored by the mean filter. Because Null values only creep into one side of the window the effective centre value gets offset.

Expected Behavior

I would expect both sides of the window to be shrunk so that the point under examination is the centre of the live data.

Installed Versions

INSTALLED VERSIONS

commit : bdc79c146c2e32f2cab629be240f01658cfb6cc2 python : 3.12.2.final.0 python-bits : 64 OS : Linux OS-release : 3.10.0-1160.36.2.el7.x86_64 Version : #1 SMP Wed Jul 21 11:57:15 UTC 2021 machine : x86_64 processor : x86_64 byteorder : little LC_ALL : None LANG : en_US.UTF-8 LOCALE : en_US.UTF-8

pandas : 2.2.1 numpy : 1.26.4 pytz : 2024.1 dateutil : 2.8.2 setuptools : 69.1.1 pip : 24.0 Cython : 3.0.10 pytest : None hypothesis : None sphinx : 7.2.6 blosc : None feather : None xlsxwriter : None lxml.etree : None html5lib : None pymysql : None psycopg2 : None jinja2 : 3.1.3 IPython : 8.22.1 pandas_datareader : None adbc-driver-postgresql: None adbc-driver-sqlite : None bs4 : 4.12.3 bottleneck : None dataframe-api-compat : None fastparquet : None fsspec : None gcsfs : None matplotlib : 3.8.3 numba : None numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : None pyreadstat : None python-calamine : None pyxlsb : None s3fs : None scipy : 1.12.0 sqlalchemy : None tables : None tabulate : 0.9.0 xarray : None xlrd : None zstandard : None tzdata : 2024.1 qtpy : 2.4.1 pyqt5 : None

jack-walp commented 3 months ago

image

sgysherry commented 3 months ago

take

sgysherry commented 3 months ago

Hi , could you try with pd.Series(np.arange(100)).rolling(21, center=True).mean().plot()? The reason why it is creating a kink is that you set min_periods=1, which make the first value of rolling().mean() to be (0+...+10)/11=5.

issue_59252

If this is not the case you are talking about, could you please give me some more explanation on what is the expected output?

jack-walp commented 2 months ago

If you don't have min_periods=1 you lose the data at the edges. I want a window that shrinsk at the edges but remains symmetric. The reason you get the kinks at the edges is because the window is no longer centred (with live data), it is an artefact of having the window become assymetric.

rhshadrach commented 1 day ago

@jack-walp - for the first data point are you wanting the window size to be 1, the 2nd the window size to be 3, etc, until 21 is hit and then the window size is 21?