quantopian / alphalens

Performance analysis of predictive (alpha) stock factors
http://quantopian.github.io/alphalens
Apache License 2.0
3.26k stars 1.13k forks source link

PERF: computation of cumulative returns is very slow #294

Closed luca-s closed 6 years ago

luca-s commented 6 years ago

I noticed a performance regression after the last Alphalens update on Quantopian.

The slowness comes from utils.diff_custom_calendar_timedeltas, have a look at the profiler output.

Profiler output, pre-patch:


         309336931 function calls (304638500 primitive calls) in 371.387 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000  371.404  371.404 tears.py:81(call_w_context)
      4/1    0.000    0.000  371.404  371.404 plotting.py:38(call_w_context)
        1    0.000    0.000  371.401  371.401 tears.py:553(create_full_tear_sheet)
        1    0.000    0.000  352.342  352.342 tears.py:137(call_w_context)
        1    0.000    0.000  352.342  352.342 tears.py:285(create_returns_tear_sheet)
       12    3.431    0.286  342.883   28.574 performance.py:336(cumulative_returns)
     4456    0.033    0.000  289.878    0.065 frame.py:4159(apply)
     4456    0.044    0.000  289.804    0.065 frame.py:4292(_apply_standard)
        2    0.000    0.000  282.991  141.495 plotting.py:752(plot_cumulative_returns_by_quantile)
       22    0.000    0.000  282.841   12.856 frame.py:4238(f)
   275466   19.579    0.000  235.100    0.001 utils.py:880(diff_custom_calendar_timedeltas)
2222644/382556    2.960    0.000  218.976    0.001 _decorators.py:65(wrapper)
1226752/337950    7.431    0.000  218.027    0.001 datetimes.py:260(__new__)
   275466    0.966    0.000  213.144    0.001 datetimes.py:2002(date_range)
   275466    3.497    0.000  210.474    0.001 datetimes.py:410(_generate)
   275466    2.482    0.000  201.804    0.001 datetimes.py:1960(_generate_regular_range)
  4691808    5.912    0.000  178.626    0.000 offsets.py:2864(generate_range)
  4469672   50.360    0.000  109.421    0.000 offsets.py:52(wrapper)
   888821    6.420    0.000   68.618    0.000 datetimes.py:184(to_datetime)
     4456    0.209    0.000   62.794    0.014 {pandas._libs.lib.reduce}
        2    0.000    0.000   60.155   30.078 plotting.py:714(plot_cumulative_returns)
  4469672

Profiler output, post-patch:

         120714327 function calls (119323876 primitive calls) in 159.258 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000  159.278  159.278 tears.py:81(call_w_context)
      4/1    0.000    0.000  159.278  159.278 plotting.py:38(call_w_context)
        1    0.000    0.000  159.275  159.275 tears.py:553(create_full_tear_sheet)
        1    0.000    0.000  137.264  137.264 tears.py:137(call_w_context)
        1    0.000    0.000  137.264  137.264 tears.py:285(create_returns_tear_sheet)
       12    3.891    0.324  128.177   10.681 performance.py:336(cumulative_returns)
     4456    0.043    0.000  114.402    0.026 frame.py:4159(apply)
     4456    0.059    0.000  114.312    0.026 frame.py:4292(_apply_standard)
        2    0.000    0.000  106.524   53.262 plotting.py:752(plot_cumulative_returns_by_quantile)
       22    0.000    0.000  106.373    4.835 frame.py:4238(f)
   239922    0.562    0.000   42.612    0.000 series.py:714(__setitem__)
   239922    0.234    0.000   40.947    0.000 series.py:717(setitem)
84856/75968    0.529    0.000   40.554    0.001 base.py:2564(get_indexer)
    17772    0.167    0.000   39.087    0.002 series.py:784(_set_with)
    17772    0.189    0.000   38.228    0.002 series.py:817(_set_labels)
     4456    0.256    0.000   28.021    0.006 {pandas._libs.lib.reduce}
     8926    0.049    0.000   26.372    0.003 datetimelike.py:424(asobject)
     8900    0.019    0.000   26.345    0.003 datetimes.py:846(astype)
     8928    0.017    0.000   25.542    0.003 datetimelike.py:240(_box_values)
     8946    3.424    0.000   25.533    0.003 {pandas._libs.lib.map_infer}
   204378   12.332    0.000   24.479    0.000 utils.py:880(diff_custom_calendar_timedeltas)
 13252310   22.414    0.000   22.414    0.000 datetimes.py:545(<lambda>)
        2    0.000    0.000   21.900   10.950 plotting.py:714(plot_cumulative_returns)
       13    0.018    0.001   19.747    1.519 groupby.py:655(apply)
       14    0.000    0.000   19.727    1.409 groupby.py:718(_python_apply_general)
       14    0.158    0.011   18.673    1.334 groupby.py:1776(apply)
        1    0.000    0.000   15.703   15.703 tears.py:108(call_w_context)
        1    0.000    0.000   15.703   15.703 tears.py:431(create_information_tear_sheet)
        2    0.001    0.001   13.898    6.949 performance.py:27(factor_information_coefficient)
    71244    0.367    0.000   12.980    0.000 series.py:598(__getitem__)
     4443    0.048    0.000   11.908    0.003 performance.py:55(src_ic)
   204378   11.476    0.000   11.476    0.000 {built-in method numpy.core.multiarray.busday_count}
    53470    0.190    0.000   10.134    0.000 series.py:644(_get_with)
    17772    0.229    0.000    9.370    0.001 generic.py:6010(pct_change)
        1    0.000    0.000    9.234    9.234 performance.py:76(mean_information_coefficient)
    40318    0.106    0.000    8.571    0.000 series.py:2424(reindex)
    40332    0.399    0.000    8.472    0.000 generic.py:2480(reindex)
    93710    0.841    0.000    8.033    0.000 internals.py:3013(apply)
294150/107002    0.381    0.000    6.904    0.000 _decorators.py:65(wrapper)
   224706    1.285    0.000    6.532    0.000 series.py:139(__init__)
124792/62436    1.237    0.000    6.519    0.000 datetimes.py:260(__new__)