BUG: Roll() function is incorrect

rsheftel commented 6 years ago

It appears that the roll() function in the utils package is incorrect. There are two problems:

For the examples below we will assume the pd.series has 10 data points and the window argument is 5.

The function does not calculate for the last period. The source of this problem is the definition of the range in the for loop. As it is now:

for I in range(window, len(args[0])):

The loop will run for i values of 5 to 9 because the len of the series is 10, but the range() function is non-inclusive of the last number. So the next line that extracts the subset of the pd.series or np.array to run the function over:

numpy array: rets = [s[i - window:i] for s in args] pandas series: rets = [s.iloc[i - window:i] for s in args]

on the last pass of the for loop i=9 and thus the last data point in the series is never included in the calculation.

The next issue is that for pandas series the above causes the results to be mapped to the wrong index. When the i in the for loop is 9, the maximum number of the loop, and thus the function is applied to elements [4, 9] the result is assigned the datetime index of 9, which is incorrect because that is the last value in the index and does not properly align to the data used in the function which is .iloc[4, 9]

How to see all of this:

For a given raw series below and using the roll() function of np.nansum() is a column of what we would expect, and what is actually returned.

              Input Series  Expected                   Actual
1/1/01                     1        
1/2/01                     2        
1/3/01                     3        
1/4/01                     4        
1/5/01                     5                15  
1/6/01                     6                20              15 
1/7/01                     7                25              20 
1/8/01                     8                30              25 
1/9/01                     9                35              30 
1/10/01                 10              40              35

What is the fix? It is simple in the _roll_ndarray() and _roll_pandas() functions make the range end at len() + 1:

for i in range(window, len(args[0]) + 1):

And for the _roll_pandas() then for the datetime index i - 1

data[args[0]. index[i - 1]] = func(*rets, **kwargs)

Why didn't any test catch this?

It appears that the test for the roll() function, test_pandas_roll() was incorrect. It expects the length of the result series to be the length of the input series minus the window. That is incorrect, the expected number of return elements is that PLUS one.

What is the solution?

Simple to fix the two offending functions in utils.py:

def _roll_ndarray(func, window, *args, **kwargs):
    data = []
    for i in range(window, len(args[0]) + 1):
        rets = [s[i-window:i] for s in args]
        data.append(func(*rets, **kwargs))
    return np.array(data)

def _roll_pandas(func, window, *args, **kwargs):
    data = {}
    for i in range(window, len(args[0]) + 1):
        rets = [s.iloc[i-window:i] for s in args]
        data[args[0].index[i - 1]] = func(*rets, **kwargs)
    return pd.Series(data)

Why not do this?

This change will cause 52 tests that were incorrectly passing to now break. I am happy to submit the PR, but I do not have the time now to correct all the broken tests.

If the package maintainers would like I can submit this PR and then over time if people want to help we can fix the tests. I don't have time now to fix them all myself.

ssanderson commented 6 years ago

@rsheftel thanks for the detailed report!

It looks like roll and its associated functions were added in https://github.com/quantopian/empyrical/pull/47#issue-229985834. Assuming this isn't by design (cc @cgdeboer @twiecki can you confirm that the behavior here is indeed a bug?) I think the best way to tackle fixing the tests would be to create a PR with the fix and create a checklist of the failing tests, and then have interested parties submit patches to fix individual tests as PRs against the main PR.

cgdeboer commented 6 years ago

@rsheftel @ssanderson thanks for bringing this to attention.

I’ll take a look over the next few days. If it is indeed a bug (which seems like it very well could be), I’m happy to remedy the tests and the two roll functions. Back to you in a day or two.

twiecki commented 6 years ago

Thanks for the detailed bug report! Definitely not the intended behavior.

ssanderson commented 6 years ago

This is released in v0.3.4.

quantopian / empyrical

BUG: Roll() function is incorrect #79