Open jasonzhang2s opened 3 years ago
@DiegoAlbertoTorres do you happen to know offhand how the ewm with times formula changes with adjust=False
? (Formula found https://pandas.pydata.org/docs/user_guide/window.html#exponentially-weighted-window)
I am not sure. The implementation of adjust=False
rests on the identity that the denominator of EWMA when adjust=True
(1 + (1-a)^1 + (1-a)^2 + ...
)) is equivalent to simply a
(see image below, substitute a
with alpha). This is expressed in the docs here:
However, when time is provided, the weight looks as below:
This is not a geometric series, so you cannot assume that it is equivalent to simply alpha. This can be easily shown by assuming a time vector which simply repeats the same timestamp to inifinity, which yields an infinite weight. I am not sure what we should do here.
I initially suspected that the adjustment (for the adjust
parameter) we make to the iteration should be the same whether time is set or not. But the fact that the proof breaks down with my counterexample, plus Jason's discovery suggests this might not hold at all. I have not run Jason's example, how big is the difference? I think if we double check the code, and construct large enough counter-examples, we should be able to empirically show that the math behind adjust=False
does not hold when the weights do not follow a geometric series.
Here's a plot of the diff of Jason's data (adjust=True
- adjust=False
) for reference
I think the safest thing to do would be to raise a NotImplementedError
for times
and adjust=False
for now.
Relevant issue: https://github.com/pandas-dev/pandas/issues/54328
I'm no expert here, but I think the solution might be to do the opposite of https://github.com/pandas-dev/pandas/pull/40314, i.e. bring back adjust=False
and raise on adjust=True
?
[ yes ] I have checked that this issue has not already been reported.
[ yes ] I have confirmed this bug exists on the latest version of pandas.
[ ] (optional) I have confirmed this bug exists on the master branch of pandas.
Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.
Code Sample, a copy-pastable example
Problem description
[this should explain why the current behaviour is a problem and why the expected output is a better solution]
Expected Output
Output of
pd.show_versions()