Closed MichaelJMath closed 5 years ago
There seem to be some failing unittests: https://travis-ci.org/quantopian/alphalens/jobs/412480773
It looks like one of the issues is that in older versions of pandas, you can't groupby an index level by specifying the name of the level. I'll look into it a bit later.
The remaining failed unittests don't seem to come from any of the code I edited. They seem to result from a problem in commit e444e8da74a524ce42b172f9931c62597508a749. My logic is as follows:
Below is one of the failed test cases:
ERROR: test_average_cumulative_return_by_quantile_0 (alphalens.tests.test_performance.PerformanceTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/travis/miniconda/envs/testenv/lib/python2.7/site-packages/parameterized/parameterized.py", line 392, in standalone_func
return func(*(a + p.args), **p.kwargs)
File "/home/travis/build/quantopian/alphalens/alphalens/tests/test_performance.py", line 920, in test_average_cumulative_return_by_quantile
0, after + 1), filter_zscore=False)
File "/home/travis/build/quantopian/alphalens/alphalens/utils.py", line 754, in get_clean_factor_and_forward_returns
filter_zscore)
File "/home/travis/build/quantopian/alphalens/alphalens/utils.py", line 304, in compute_forward_returns
period_len = diff_custom_calendar_timedeltas(start, end, freq)
File "/home/travis/build/quantopian/alphalens/alphalens/utils.py", line 912, in diff_custom_calendar_timedeltas
actual_days = np.busday_count(start, end, weekmask, holidays)
TypeError: Iterator operand 0 dtype could not be cast from dtype('<M8[us]') to dtype('<M8[D]') according to the rule 'safe'
It seems there is an issue in the way pandas stores dates. (https://stackoverflow.com/questions/31917964/python-numpy-cannot-convert-datetime64ns-to-datetime64d-to-use-with-numba)
I tested this using the following virtual environment configs (matching what was in the unittest):
conda create -q -n testenv --yes python=2.7 ipython flake8 numpy scipy nose matplotlib pandas=0.20.3 statsmodels seaborn
I tried checking out prior commits and got the error when testing on every commit going back to the e444e8da74a524ce42b172f9931c62597508a749 commit. The commit before that did not generate the error.
This might need to be listed as a separate issue, but I'm not sure why it would just be showing up now on the unittests, and not on prior commits.
Please let me know if I seem to be off track here.
EDIT: Some additional info: When I change line 912 in utils to:
actual_days = np.busday_count(np.array(start).astype('datetime64[D]'), np.array(end).astype('datetime64[D]'), weekmask, holidays)
The code ran without error.
Thanks for tracking this done @MichaelJMath, I bet you are right. Want to do another PR with your fix?
@MichaelJMath I believe the issue has arisen because we don't enforce a particular version of each module Alphalens uses. Currently we only force few pandas and python versions but the other modules might change with time, when new module versions are released. This issue might be due to a new numpy version then.
And the previous comment makes me think we might want to test few specific numpy versions too, as we do we pandas. So that we are sure Alphalens works fine with a specific subset of numpy/pandas/python versions, currently we are only sure about pandas/python versions.
@luca-s Sounds like a good idea, as long as we also test with the most recent version to detect regressions like these.
@twiecki I checked both pyfolio and zipline and they do not force a particular numpy version, so let's keep it this way. As you said, in this way we are able to detect regressions in more recent version of numpy.
@twiecki, I'll work on submitting another pull request this weekend.
@MichaelJMath Appreciate it!
@MichaelJMath Could you please rebase on master? We should see the tests not failing anymore.
@MichaelJMath beautiful, thank you! Is there any chance you could create a test case to avoid future regressions? I have just noticed the test cases don't cover perf.mean_return_by_quantile
and this is probably why we hit the bug in the first place.
I'll give it a shot. I probably won't get to it this weekend, but I'll put it on my to-do list for next weekend.
@MichaelJMath thank you!
@twiecki we can still merge this PR, can't we?
Thanks guys!
@MichaelJMath This weekend I have some time to work on the missing test, so don't worry, I will do it.
Compute mean return by date. If by_date flag is false, then compute and return the average of the daily mean returns (and also the standard error of these daily mean returns).
Resolves: Issue #309