Closed rsheftel closed 6 years ago
@rsheftel thanks for the detailed report!
It looks like roll
and its associated functions were added in https://github.com/quantopian/empyrical/pull/47#issue-229985834. Assuming this isn't by design (cc @cgdeboer @twiecki can you confirm that the behavior here is indeed a bug?) I think the best way to tackle fixing the tests would be to create a PR with the fix and create a checklist of the failing tests, and then have interested parties submit patches to fix individual tests as PRs against the main PR.
@rsheftel @ssanderson thanks for bringing this to attention.
I’ll take a look over the next few days. If it is indeed a bug (which seems like it very well could be), I’m happy to remedy the tests and the two roll functions. Back to you in a day or two.
Thanks for the detailed bug report! Definitely not the intended behavior.
This is released in v0.3.4.
It appears that the roll() function in the utils package is incorrect. There are two problems:
For the examples below we will assume the pd.series has 10 data points and the window argument is 5.
for I in range(window, len(args[0])):
The loop will run for i values of 5 to 9 because the len of the series is 10, but the range() function is non-inclusive of the last number. So the next line that extracts the subset of the pd.series or np.array to run the function over:
numpy array:
rets = [s[i - window:i] for s in args]
pandas series:rets = [s.iloc[i - window:i] for s in args]
on the last pass of the for loop i=9 and thus the last data point in the series is never included in the calculation.
How to see all of this:
For a given raw series below and using the roll() function of np.nansum() is a column of what we would expect, and what is actually returned.
What is the fix? It is simple in the _roll_ndarray() and _roll_pandas() functions make the range end at len() + 1:
for i in range(window, len(args[0]) + 1):
And for the _roll_pandas() then for the datetime index
i - 1
data[args[0]. index[i - 1]] = func(*rets, **kwargs)
Why didn't any test catch this?
It appears that the test for the roll() function, test_pandas_roll() was incorrect. It expects the length of the result series to be the length of the input series minus the window. That is incorrect, the expected number of return elements is that PLUS one.
What is the solution?
Simple to fix the two offending functions in utils.py:
Why not do this?
This change will cause 52 tests that were incorrectly passing to now break. I am happy to submit the PR, but I do not have the time now to correct all the broken tests.
If the package maintainers would like I can submit this PR and then over time if people want to help we can fix the tests. I don't have time now to fix them all myself.