adjust=True is not applied for ewm, e.g., some_series.emw(span=nn, adjust=True)

TwoToFourYears commented 6 years ago

Code Sample, a copy-pastable example if possible

import pandas as pd

xx = pd.Series([0,1,2,3,4,5,6,7,8,9]) yy = xx.ewm(span=2, adjust=False).mean () zz = xx.ewm(span=2, adjust=True).mean ()

Problem description

The ewm returns the same values for adjust=True as for adjust=False. For span=2, alpha=2/3, and the weights should be [1/3, 1], or, equivalently [1, 3].

Expected Output

The result should be [ nan, 3, 7, 11, 15, 17, 21, 25, 29, 33 ] / 4

Output of 0.20.3 and 0.21.1 is

[ 0, 0.75, 1.615385, 2.550000, 3.520661, 4.508242, 5.503202, 6.501220, 7.500457, 8.500169 ] which is the same as adjust=False

max-sixty commented 6 years ago

Are you sure?

In [1]: import pandas as pd
   ...:
   ...: xx = pd.Series([0,1,2,3,4,5,6,7,8,9])
   ...: yy = xx.ewm(span=2, adjust=False).mean ()
   ...: zz = xx.ewm(span=2, adjust=True).mean ()
   ...:

In [2]: yy
Out[2]:
0    0.000000
1    0.666667
2    1.555556
3    2.518519
4    3.506173
5    4.502058
6    5.500686
7    6.500229
8    7.500076
9    8.500025
dtype: float64

In [3]: zz
Out[3]:
0    0.000000
1    0.750000
2    1.615385
3    2.550000
4    3.520661
5    4.508242
6    5.503202
7    6.501220
8    7.500457
9    8.500169
dtype: float64

TwoToFourYears commented 6 years ago

You are correct that they give different values. Sorry about not being as careful as I should have.

However, The value for adjust=True are still incorrect. The values returned should be by using a weighted average as specified in the doc page which has the values:

(1-alpha)**(n-1), (1-alpha)**(n-2), ..., 1-alpha, 1.

So for span=2, alpha=2/3, n=2, and the weights are 1/3, 1. The returned values are given by:

zz[i] = (xx[i-1] / 3 + xx[i]) / (4/3) or zz[i] = (xx[i-1] + 3*xx[i]) / 4

Or for this example:

[ nan, 3, 7, 11, 15, 17, 21, 25, 29, 33 ] / 4

Or is the intent different than this?

max-sixty commented 6 years ago

Can you work through the n=2 example again?

The numerator should be 1*2 + 1/3 *1 = 7/3 The denominator should be 1 + 1/3 + 1/9 = 13/9 and that =1.615...

TwoToFourYears commented 6 years ago

Excuse me, but shouldn't the denominator be the sum of the weights used in the numerator: 1 + 1/3 = 4/3 and not 1 + 1/3 + 1/9?

More generally, do I have the right idea about the "adjust" parameter? Using Wikipedia as a common reference, we have two possibilities: an expanding window versus a fixed width window. Using xx for the input series, and zz for the output:

Isn't adjust=False the expanding window: zz[i] = alpha * xx[i] + (1 - alpha) * zz[i-1]

Or a window to the beginning of the series: zz[i] = (xx[i] + (1-alpha) * xx[i-1] + ... + (1-alpha)**i * xx[0]) / (sum of weights)

While adust=True is a fixed width window with n terms zz[i] = (xx[i] + (1-alpha) * xx[i-1] + ... + (1-alpha)**(n-1) * xx[i - (n-1)]) / (sum of weights)

Final note, I left out that my original hand calculation effectively used min_period=n.

max-sixty commented 6 years ago

Excuse me, but shouldn't the denominator be the sum of the weights used in the numerator: 1 + 1/3 = 4/3 and not 1 + 1/3 + 1/9?

The numerator has a 1/9 weight on 0, the first point

max-sixty commented 6 years ago

Could you rephrase the rest of the question - what's the result you're expecting vs the result you're seeing?

TwoToFourYears commented 6 years ago

The doc page says:

When adjust is True (default), weighted averages are calculated using weights (1-alpha)(n-1), (1-alpha)(n-2), ..., 1-alpha, 1.

When adjust is False, weighted averages are calculated recursively as: weighted_average[0] = arg[0]; weighted_average[i] = (1-alpha)weighted_average[i-1] + alphaarg[i]

So does this state that adjust=True uses a fixed width window of span=2 (for the example) and not an expanding window that extends to the beginning?

Or if you go here and scroll down a bit you get:

One must specify precisely one of span, center of mass, half-life and alpha to the EW functions:

Span corresponds to what is commonly called an “N-day EW moving average”.

Center of mass has a more physical interpretation and can be thought of in terms of span: c=(s−1)/2c = (s - 1) / 2.

Half-life is the period of time for the exponential weight to reduce to one half.

Alpha specifies the smoothing factor directly.

Again stating or at least suggesting the using span as an argument (at least with adjust=True) returns an exponentially weighted average on a fixed width window.

max-sixty commented 6 years ago

Can you make an example with the result you're expecting vs the result you're seeing?

TwoToFourYears commented 6 years ago

Sorry for the delay.

Let's back up. The question is what is the method supposed to do. Reading the documentation, the method - at least in part - implements the recursive exponent weighted average:

y[t] = alpha * y[t-1] + (1 - alpha) * x[t]

x[t] is the input, y[t] the output. This method, with some variations, results in an expanding window.

An alternative is to use a rolling (or moving) fixed length window. For a window width 2, we get alpha=2/3, and weights of 1/3, and 1. The rolling window would give:

y[0] = (1 * x[0]) / 1
y[1] = (1 * x[1] + 1/3 * x[0]) / (1 + 1/3)
y[2] = (1 * x[2] + 1/3 * x[1]) / (1 + 1/3)
y[3] = (1 * x[3] + 1/3 * x[2]) / (1 + 1/3)

Here the documentation says:

When adjust is False, weighted averages are calculated recursively as: weighted_average[0] = arg[0]; weighted_average[i] = (1-alpha)*weighted_average[i-1] + alpha*arg[i].

which is the recursive/expanding window method

However right above this quote, the same doc says:

When adjust is True (default), weighted averages are calculated using weights (1-alpha)**(n-1), (1-alpha)**(n-2), ..., 1-alpha, 1.

Further, going here and paging down a couple of times and you get the following explanation of span, half-life, and other terms:

Span corresponds to what is commonly called an “N-day EW moving average”.

This at least suggests that calling the function as: aSeries.ewm (span=2, adjust=True).mean () will result in a rolling window as above. So for x = [ 0, 1, 2, 3, ... ] we get:

y[0] =  1 * 0 / 1
y[1] = (1 * 1 + 1/3 * 0) / (1 + 1/3)
y[2] = (1 * 2 + 1/3 * 1) / (1 + 1/3)
y[3] = (1 * 3 + 1/3 * 2) / (1 + 1/3)

which is not what is produced

So either the method is not making a rolling (or moving) window calculation, or the documentation is problematic. Rereading the document several times, I tend to think the rolling window calculation is not intended, but rather the documentation is overly fuzzy.

max-sixty commented 6 years ago

In the interests of expediency, I'm going to jump in at the first point I have a question.

The rolling window would give:
y[0] = (1 * x[0]) / 1
y[1] = (1 * x[1] + 1/3 * x[0]) / (1 + 1/3)

When you say rolling, do you mean ewm? If so, should that be 1 * y[0] rather than 1 * x[1]? If not, how do you derive the formula? There's no alpha in a rolling calc, only a window?

TwoToFourYears commented 6 years ago

rolling is the method rolling or the moving window of mwa. There are various documents that describe the calculations.

mroeschke commented 5 years ago

Closing as this appears mostly a usage question. If there's specifics in the documentation that could be improved, a new issue can be opened to address that.

pandas-dev / pandas