quantopian / alphalens

Performance analysis of predictive (alpha) stock factors
http://quantopian.github.io/alphalens
Apache License 2.0
3.18k stars 1.12k forks source link

Perf improvements #361

Closed jmccorriston closed 4 years ago

jmccorriston commented 4 years ago

This PR includes changes that address the performance issues highlighted in https://github.com/quantopian/alphalens/issues/357. This change simplifies the cumulative_returns computation, which significantly speeds up the create_returns_tear_sheet function (and subsequently, create_full_tear_sheet).

The changes definitely need input from other folks including @luca-s and someone from engineering at Quantopian (I'll get someone to take a look), so I added a do not merge label. I expect I'll need to make pretty significant changes before we can merge.

The branch also sprawled a bit and made a number of other changes, including a few additions and tweaks to functions in alphalens.utils as well as some stylistic changes, and minor functional changes. The description below gives a summary of the changes made in this PR.

Changes

jmccorriston commented 4 years ago

Here's a profile of the version in this branch:

Wed Feb 19 10:41:50 2020    returns_tearsheet_profile.stats

         7705970 function calls (7550159 primitive calls) in 17.993 seconds

   Ordered by: cumulative time
   List reduced from 3252 to 20 due to restriction <20>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.001    0.001   17.993   17.993 <ipython-input-3-ca392862d9db>:11(run_returns_tear_sheet)
        1    0.001    0.001   17.992   17.992 /Users/jmccorriston/quant-repos/alphalens/alphalens/plotting.py:38(call_w_context)
        1    0.028    0.028   17.964   17.964 /Users/jmccorriston/quant-repos/alphalens/alphalens/tears.py:165(create_returns_tear_sheet)
        2    0.058    0.029    8.409    4.205 /Users/jmccorriston/quant-repos/alphalens/alphalens/performance.py:454(mean_return_by_quantile)
        2    0.029    0.015    7.395    3.698 /Users/jmccorriston/quant-repos/alphalens/alphalens/utils.py:382(demean_forward_returns)
        2    0.047    0.024    7.022    3.511 /Users/jmccorriston/.virtualenvs/alphalens_env/lib/python3.7/site-packages/pandas/core/groupby/generic.py:570(transform)
        2    0.024    0.012    6.974    3.487 /Users/jmccorriston/.virtualenvs/alphalens_env/lib/python3.7/site-packages/pandas/core/groupby/generic.py:516(_transform_general)
        1    0.009    0.009    4.829    4.829 /Users/jmccorriston/quant-repos/alphalens/alphalens/performance.py:208(factor_returns)
        1    0.005    0.005    4.210    4.210 /Users/jmccorriston/quant-repos/alphalens/alphalens/performance.py:129(factor_weights)
        1    0.000    0.000    4.084    4.084 /Users/jmccorriston/.virtualenvs/alphalens_env/lib/python3.7/site-packages/pandas/core/groupby/generic.py:809(apply)
        1    0.007    0.007    4.084    4.084 /Users/jmccorriston/.virtualenvs/alphalens_env/lib/python3.7/site-packages/pandas/core/groupby/groupby.py:695(apply)
        1    0.009    0.009    4.077    4.077 /Users/jmccorriston/.virtualenvs/alphalens_env/lib/python3.7/site-packages/pandas/core/groupby/groupby.py:741(_python_apply_general)
        3    0.001    0.000    3.756    1.252 /Users/jmccorriston/.virtualenvs/alphalens_env/lib/python3.7/site-packages/IPython/core/display.py:131(display)
        3    0.001    0.000    3.734    1.245 /Users/jmccorriston/.virtualenvs/alphalens_env/lib/python3.7/site-packages/IPython/core/formatters.py:89(format)
       36    0.000    0.000    3.733    0.104 /Users/jmccorriston/.virtualenvs/alphalens_env/lib/python3.7/site-packages/IPython/core/formatters.py:220(catch_format_error)
        1    0.000    0.000    3.726    3.726 /Users/jmccorriston/.virtualenvs/alphalens_env/lib/python3.7/site-packages/matplotlib/pyplot.py:251(show)
        1    0.000    0.000    3.726    3.726 /Users/jmccorriston/.virtualenvs/alphalens_env/lib/python3.7/site-packages/ipykernel/pylab/backend_inline.py:23(show)
       27    0.000    0.000    3.718    0.138 </Users/jmccorriston/.virtualenvs/alphalens_env/lib/python3.7/site-packages/decorator.py:decorator-gen-9>:1(__call__)
       27    0.000    0.000    3.718    0.138 /Users/jmccorriston/.virtualenvs/alphalens_env/lib/python3.7/site-packages/IPython/core/formatters.py:331(__call__)
        2    0.000    0.000    3.711    1.855 /Users/jmccorriston/.virtualenvs/alphalens_env/lib/python3.7/site-packages/IPython/core/pylabtools.py:244(<lambda>)

Here was the profile from the original issue (https://github.com/quantopian/alphalens/issues/357):

Fri Jan 31 10:09:55 2020    returns_tearsheet_profile.stats

         62029996 function calls (61402118 primitive calls) in 130.569 seconds

   Ordered by: cumulative time
   List reduced from 3735 to 20 due to restriction <20>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000  130.587  130.587 <ipython-input-1-eb0a21d53746>:10(run_returns_tear_sheet)
        1    0.001    0.001  130.587  130.587 /Users/jmccorriston/quant-repos/alphalens/alphalens/plotting.py:38(call_w_context)
        1    0.038    0.038  130.566  130.566 /Users/jmccorriston/quant-repos/alphalens/alphalens/tears.py:165(create_returns_tear_sheet)
        6    0.863    0.144   98.381   16.397 /Users/jmccorriston/quant-repos/alphalens/alphalens/performance.py:332(cumulative_returns)
        1    0.000    0.000   80.395   80.395 /Users/jmccorriston/quant-repos/alphalens/alphalens/plotting.py:757(plot_cumulative_returns_by_quantile)
        7    0.000    0.000   80.307   11.472 /Users/jmccorriston/.virtualenvs/alphalens_env/lib/python3.7/site-packages/pandas/core/frame.py:6737(apply)
        7    0.000    0.000   80.298   11.471 /Users/jmccorriston/.virtualenvs/alphalens_env/lib/python3.7/site-packages/pandas/core/apply.py:144(get_result)
        7    0.001    0.000   80.297   11.471 /Users/jmccorriston/.virtualenvs/alphalens_env/lib/python3.7/site-packages/pandas/core/apply.py:261(apply_standard)
       11    0.000    0.000   80.247    7.295 /Users/jmccorriston/.virtualenvs/alphalens_env/lib/python3.7/site-packages/pandas/core/apply.py:111(f)
        7    0.000    0.000   56.973    8.139 /Users/jmccorriston/.virtualenvs/alphalens_env/lib/python3.7/site-packages/pandas/core/apply.py:297(apply_series_generator)
    13608    0.093    0.000   47.133    0.003 /Users/jmccorriston/.virtualenvs/alphalens_env/lib/python3.7/site-packages/pandas/core/series.py:1188(__setitem__)
    13608    0.069    0.000   46.881    0.003 /Users/jmccorriston/.virtualenvs/alphalens_env/lib/python3.7/site-packages/pandas/core/series.py:1191(setitem)
     4536    0.136    0.000   46.233    0.010 /Users/jmccorriston/.virtualenvs/alphalens_env/lib/python3.7/site-packages/pandas/core/series.py:1261(_set_with)
     4536    0.469    0.000   45.371    0.010 /Users/jmccorriston/.virtualenvs/alphalens_env/lib/python3.7/site-packages/pandas/core/series.py:1303(_set_labels)
22715/18179    0.485    0.000   43.101    0.002 /Users/jmccorriston/.virtualenvs/alphalens_env/lib/python3.7/site-packages/pandas/core/indexes/base.py:2957(get_indexer)
     4541    0.082    0.000   31.554    0.007 /Users/jmccorriston/.virtualenvs/alphalens_env/lib/python3.7/site-packages/pandas/core/indexes/datetimelike.py:686(astype)
     4536    0.035    0.000   30.160    0.007 /Users/jmccorriston/.virtualenvs/alphalens_env/lib/python3.7/site-packages/pandas/core/arrays/datetimes.py:706(astype)
     4541    0.054    0.000   29.979    0.007 /Users/jmccorriston/.virtualenvs/alphalens_env/lib/python3.7/site-packages/pandas/core/arrays/datetimelike.py:516(astype)
     4541    0.023    0.000   29.849    0.007 /Users/jmccorriston/.virtualenvs/alphalens_env/lib/python3.7/site-packages/pandas/core/arrays/datetimelike.py:346(_box_values)
     4548    3.760    0.001   29.825    0.007 {pandas._libs.lib.map_infer}
luca-s commented 4 years ago

@jmccorriston good job! I will have a look at this PR in the following days.

dmichalowicz commented 4 years ago

The build is now passing for python 2.7 and 3.5, which is the current state of master. I think the last question is in regards to https://github.com/quantopian/alphalens/pull/361/files#r415894312, and if we believe that's right then I think this PR is good.

altquant commented 4 years ago

the new call signature to cumulative_returns() will not be compatible with this: https://github.com/quantopian/alphalens/blob/master/alphalens/performance.py#L933