pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
43.78k stars 17.97k forks source link

adjust=True is not applied for ewm, e.g., some_series.emw(span=nn, adjust=True) #18863

Closed TwoToFourYears closed 5 years ago

TwoToFourYears commented 6 years ago

Code Sample, a copy-pastable example if possible

import pandas as pd

xx = pd.Series([0,1,2,3,4,5,6,7,8,9]) yy = xx.ewm(span=2, adjust=False).mean () zz = xx.ewm(span=2, adjust=True).mean ()

Problem description

The ewm returns the same values for adjust=True as for adjust=False. For span=2, alpha=2/3, and the weights should be [1/3, 1], or, equivalently [1, 3].

Expected Output

The result should be [ nan, 3, 7, 11, 15, 17, 21, 25, 29, 33 ] / 4

Output of 0.20.3 and 0.21.1 is

[ 0, 0.75, 1.615385, 2.550000, 3.520661, 4.508242, 5.503202, 6.501220, 7.500457, 8.500169 ] which is the same as adjust=False

max-sixty commented 6 years ago

Are you sure?

In [1]: import pandas as pd
   ...:
   ...: xx = pd.Series([0,1,2,3,4,5,6,7,8,9])
   ...: yy = xx.ewm(span=2, adjust=False).mean ()
   ...: zz = xx.ewm(span=2, adjust=True).mean ()
   ...:

In [2]: yy
Out[2]:
0    0.000000
1    0.666667
2    1.555556
3    2.518519
4    3.506173
5    4.502058
6    5.500686
7    6.500229
8    7.500076
9    8.500025
dtype: float64

In [3]: zz
Out[3]:
0    0.000000
1    0.750000
2    1.615385
3    2.550000
4    3.520661
5    4.508242
6    5.503202
7    6.501220
8    7.500457
9    8.500169
dtype: float64
TwoToFourYears commented 6 years ago

You are correct that they give different values. Sorry about not being as careful as I should have.

However, The value for adjust=True are still incorrect. The values returned should be by using a weighted average as specified in the doc page which has the values:

(1-alpha)**(n-1), (1-alpha)**(n-2), ..., 1-alpha, 1.

So for span=2, alpha=2/3, n=2, and the weights are 1/3, 1. The returned values are given by:

zz[i] = (xx[i-1] / 3 + xx[i]) / (4/3) or zz[i] = (xx[i-1] + 3*xx[i]) / 4

Or for this example:

[ nan, 3, 7, 11, 15, 17, 21, 25, 29, 33 ] / 4

Or is the intent different than this?

max-sixty commented 6 years ago

Can you work through the n=2 example again?

The numerator should be 1*2 + 1/3 *1 = 7/3 The denominator should be 1 + 1/3 + 1/9 = 13/9 and that =1.615...

TwoToFourYears commented 6 years ago

Excuse me, but shouldn't the denominator be the sum of the weights used in the numerator: 1 + 1/3 = 4/3 and not 1 + 1/3 + 1/9?

More generally, do I have the right idea about the "adjust" parameter? Using Wikipedia as a common reference, we have two possibilities: an expanding window versus a fixed width window. Using xx for the input series, and zz for the output:

Isn't adjust=False the expanding window: zz[i] = alpha * xx[i] + (1 - alpha) * zz[i-1]

Or a window to the beginning of the series: zz[i] = (xx[i] + (1-alpha) * xx[i-1] + ... + (1-alpha)**i * xx[0]) / (sum of weights)

While adust=True is a fixed width window with n terms zz[i] = (xx[i] + (1-alpha) * xx[i-1] + ... + (1-alpha)**(n-1) * xx[i - (n-1)]) / (sum of weights)

Final note, I left out that my original hand calculation effectively used min_period=n.

max-sixty commented 6 years ago

Excuse me, but shouldn't the denominator be the sum of the weights used in the numerator: 1 + 1/3 = 4/3 and not 1 + 1/3 + 1/9?

The numerator has a 1/9 weight on 0, the first point

max-sixty commented 6 years ago

Could you rephrase the rest of the question - what's the result you're expecting vs the result you're seeing?

TwoToFourYears commented 6 years ago

The doc page says:

When adjust is True (default), weighted averages are calculated using weights (1-alpha)(n-1), (1-alpha)(n-2), ..., 1-alpha, 1.

When adjust is False, weighted averages are calculated recursively as: weighted_average[0] = arg[0]; weighted_average[i] = (1-alpha)weighted_average[i-1] + alphaarg[i]

So does this state that adjust=True uses a fixed width window of span=2 (for the example) and not an expanding window that extends to the beginning?

Or if you go here and scroll down a bit you get:

One must specify precisely one of span, center of mass, half-life and alpha to the EW functions:

  • Span corresponds to what is commonly called an “N-day EW moving average”.
  • Center of mass has a more physical interpretation and can be thought of in terms of span: c=(s−1)/2c = (s - 1) / 2.
  • Half-life is the period of time for the exponential weight to reduce to one half.
  • Alpha specifies the smoothing factor directly.

Again stating or at least suggesting the using span as an argument (at least with adjust=True) returns an exponentially weighted average on a fixed width window.

max-sixty commented 6 years ago

Can you make an example with the result you're expecting vs the result you're seeing?

TwoToFourYears commented 6 years ago

Sorry for the delay.

Let's back up. The question is what is the method supposed to do. Reading the documentation, the method - at least in part - implements the recursive exponent weighted average:

y[t] = alpha * y[t-1] + (1 - alpha) * x[t]

x[t] is the input, y[t] the output. This method, with some variations, results in an expanding window.

An alternative is to use a rolling (or moving) fixed length window. For a window width 2, we get alpha=2/3, and weights of 1/3, and 1. The rolling window would give:

y[0] = (1 * x[0]) / 1
y[1] = (1 * x[1] + 1/3 * x[0]) / (1 + 1/3)
y[2] = (1 * x[2] + 1/3 * x[1]) / (1 + 1/3)
y[3] = (1 * x[3] + 1/3 * x[2]) / (1 + 1/3)

Here the documentation says:

When adjust is False, weighted averages are calculated recursively as: weighted_average[0] = arg[0]; weighted_average[i] = (1-alpha)*weighted_average[i-1] + alpha*arg[i].

which is the recursive/expanding window method

However right above this quote, the same doc says:

When adjust is True (default), weighted averages are calculated using weights (1-alpha)**(n-1), (1-alpha)**(n-2), ..., 1-alpha, 1.

Further, going here and paging down a couple of times and you get the following explanation of span, half-life, and other terms:

Span corresponds to what is commonly called an “N-day EW moving average”.

This at least suggests that calling the function as: aSeries.ewm (span=2, adjust=True).mean () will result in a rolling window as above. So for x = [ 0, 1, 2, 3, ... ] we get:

y[0] =  1 * 0 / 1
y[1] = (1 * 1 + 1/3 * 0) / (1 + 1/3)
y[2] = (1 * 2 + 1/3 * 1) / (1 + 1/3)
y[3] = (1 * 3 + 1/3 * 2) / (1 + 1/3)

which is not what is produced

So either the method is not making a rolling (or moving) window calculation, or the documentation is problematic. Rereading the document several times, I tend to think the rolling window calculation is not intended, but rather the documentation is overly fuzzy.

max-sixty commented 6 years ago

In the interests of expediency, I'm going to jump in at the first point I have a question.

The rolling window would give:

y[0] = (1 * x[0]) / 1
y[1] = (1 * x[1] + 1/3 * x[0]) / (1 + 1/3)

When you say rolling, do you mean ewm? If so, should that be 1 * y[0] rather than 1 * x[1]? If not, how do you derive the formula? There's no alpha in a rolling calc, only a window?

TwoToFourYears commented 6 years ago

rolling is the method rolling or the moving window of mwa. There are various documents that describe the calculations.

mroeschke commented 5 years ago

Closing as this appears mostly a usage question. If there's specifics in the documentation that could be improved, a new issue can be opened to address that.