quantopian / alphalens

Performance analysis of predictive (alpha) stock factors
http://quantopian.github.io/alphalens
Apache License 2.0
3.2k stars 1.12k forks source link

BUG: Calculate mean returns by date before mean return by quantile #313

Closed MichaelJMath closed 5 years ago

MichaelJMath commented 5 years ago

Compute mean return by date. If by_date flag is false, then compute and return the average of the daily mean returns (and also the standard error of these daily mean returns).

Resolves: Issue #309

twiecki commented 5 years ago

There seem to be some failing unittests: https://travis-ci.org/quantopian/alphalens/jobs/412480773

MichaelJMath commented 5 years ago

It looks like one of the issues is that in older versions of pandas, you can't groupby an index level by specifying the name of the level. I'll look into it a bit later.

MichaelJMath commented 5 years ago

The remaining failed unittests don't seem to come from any of the code I edited. They seem to result from a problem in commit e444e8da74a524ce42b172f9931c62597508a749. My logic is as follows:

Below is one of the failed test cases:

ERROR: test_average_cumulative_return_by_quantile_0 (alphalens.tests.test_performance.PerformanceTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/travis/miniconda/envs/testenv/lib/python2.7/site-packages/parameterized/parameterized.py", line 392, in standalone_func
    return func(*(a + p.args), **p.kwargs)
  File "/home/travis/build/quantopian/alphalens/alphalens/tests/test_performance.py", line 920, in test_average_cumulative_return_by_quantile
    0, after + 1), filter_zscore=False)
  File "/home/travis/build/quantopian/alphalens/alphalens/utils.py", line 754, in get_clean_factor_and_forward_returns
    filter_zscore)
  File "/home/travis/build/quantopian/alphalens/alphalens/utils.py", line 304, in compute_forward_returns
    period_len = diff_custom_calendar_timedeltas(start, end, freq)
  File "/home/travis/build/quantopian/alphalens/alphalens/utils.py", line 912, in diff_custom_calendar_timedeltas
    actual_days = np.busday_count(start, end, weekmask, holidays)
TypeError: Iterator operand 0 dtype could not be cast from dtype('<M8[us]') to dtype('<M8[D]') according to the rule 'safe'

It seems there is an issue in the way pandas stores dates. (https://stackoverflow.com/questions/31917964/python-numpy-cannot-convert-datetime64ns-to-datetime64d-to-use-with-numba)

I tested this using the following virtual environment configs (matching what was in the unittest):

conda create -q -n testenv --yes python=2.7 ipython flake8 numpy scipy nose matplotlib pandas=0.20.3 statsmodels seaborn

I tried checking out prior commits and got the error when testing on every commit going back to the e444e8da74a524ce42b172f9931c62597508a749 commit. The commit before that did not generate the error.

This might need to be listed as a separate issue, but I'm not sure why it would just be showing up now on the unittests, and not on prior commits.

Please let me know if I seem to be off track here.

EDIT: Some additional info: When I change line 912 in utils to:

actual_days  = np.busday_count(np.array(start).astype('datetime64[D]'), np.array(end).astype('datetime64[D]'),  weekmask, holidays)

The code ran without error.

twiecki commented 5 years ago

Thanks for tracking this done @MichaelJMath, I bet you are right. Want to do another PR with your fix?

luca-s commented 5 years ago

@MichaelJMath I believe the issue has arisen because we don't enforce a particular version of each module Alphalens uses. Currently we only force few pandas and python versions but the other modules might change with time, when new module versions are released. This issue might be due to a new numpy version then.

luca-s commented 5 years ago

And the previous comment makes me think we might want to test few specific numpy versions too, as we do we pandas. So that we are sure Alphalens works fine with a specific subset of numpy/pandas/python versions, currently we are only sure about pandas/python versions.

twiecki commented 5 years ago

@luca-s Sounds like a good idea, as long as we also test with the most recent version to detect regressions like these.

luca-s commented 5 years ago

@twiecki I checked both pyfolio and zipline and they do not force a particular numpy version, so let's keep it this way. As you said, in this way we are able to detect regressions in more recent version of numpy.

MichaelJMath commented 5 years ago

@twiecki, I'll work on submitting another pull request this weekend.

twiecki commented 5 years ago

@MichaelJMath Appreciate it!

luca-s commented 5 years ago

@MichaelJMath Could you please rebase on master? We should see the tests not failing anymore.

luca-s commented 5 years ago

@MichaelJMath beautiful, thank you! Is there any chance you could create a test case to avoid future regressions? I have just noticed the test cases don't cover perf.mean_return_by_quantile and this is probably why we hit the bug in the first place.

MichaelJMath commented 5 years ago

I'll give it a shot. I probably won't get to it this weekend, but I'll put it on my to-do list for next weekend.

luca-s commented 5 years ago

@MichaelJMath thank you!

luca-s commented 5 years ago

@twiecki we can still merge this PR, can't we?

twiecki commented 5 years ago

Thanks guys!

luca-s commented 5 years ago

@MichaelJMath This weekend I have some time to work on the missing test, so don't worry, I will do it.