Open jack-walp opened 3 months ago
take
Hi , could you try with pd.Series(np.arange(100)).rolling(21, center=True).mean().plot()
?
The reason why it is creating a kink is that you set min_periods=1
, which make the first value of rolling().mean()
to be (0+...+10)/11=5
.
If this is not the case you are talking about, could you please give me some more explanation on what is the expected output?
If you don't have min_periods=1
you lose the data at the edges. I want a window that shrinsk at the edges but remains symmetric. The reason you get the kinks at the edges is because the window is no longer centred (with live data), it is an artefact of having the window become assymetric.
@jack-walp - for the first data point are you wanting the window size to be 1, the 2nd the window size to be 3, etc, until 21 is hit and then the window size is 21?
Pandas version checks
[X] I have checked that this issue has not already been reported.
[X] I have confirmed this bug exists on the latest version of pandas.
[ ] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
Issue Description
The
np.arange
gives a simple linear trend that should not be affected by the rolling mean filter. However at the edges the mean filter pulls values more towards the centre than expected, causing kinks in the curve. It looks like at the edge of the data Null values creep into the window and these are ignored by the mean filter. Because Null values only creep into one side of the window the effective centre value gets offset.Expected Behavior
I would expect both sides of the window to be shrunk so that the point under examination is the centre of the live data.
Installed Versions
INSTALLED VERSIONS
commit : bdc79c146c2e32f2cab629be240f01658cfb6cc2 python : 3.12.2.final.0 python-bits : 64 OS : Linux OS-release : 3.10.0-1160.36.2.el7.x86_64 Version : #1 SMP Wed Jul 21 11:57:15 UTC 2021 machine : x86_64 processor : x86_64 byteorder : little LC_ALL : None LANG : en_US.UTF-8 LOCALE : en_US.UTF-8
pandas : 2.2.1 numpy : 1.26.4 pytz : 2024.1 dateutil : 2.8.2 setuptools : 69.1.1 pip : 24.0 Cython : 3.0.10 pytest : None hypothesis : None sphinx : 7.2.6 blosc : None feather : None xlsxwriter : None lxml.etree : None html5lib : None pymysql : None psycopg2 : None jinja2 : 3.1.3 IPython : 8.22.1 pandas_datareader : None adbc-driver-postgresql: None adbc-driver-sqlite : None bs4 : 4.12.3 bottleneck : None dataframe-api-compat : None fastparquet : None fsspec : None gcsfs : None matplotlib : 3.8.3 numba : None numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : None pyreadstat : None python-calamine : None pyxlsb : None s3fs : None scipy : 1.12.0 sqlalchemy : None tables : None tabulate : 0.9.0 xarray : None xlrd : None zstandard : None tzdata : 2024.1 qtpy : 2.4.1 pyqt5 : None