quantopian / pyfolio

Portfolio and risk analytics in Python
https://quantopian.github.io/pyfolio
Apache License 2.0
5.65k stars 1.77k forks source link

Plot_bayesian_cone (problem might be in ppc_t) #650

Open Vinayak285 opened 4 years ago

Vinayak285 commented 4 years ago

Problem Description

In bayesian.py, _plot_bayes_cone has variable perc which results in the error given down the post. So it is apparently of the wrong shape (wrong values - training, correct index - test. The culprit is variable preds, a dict, that is used in a function in tears.py where preds = ppc_t that in turn comes from the bayesian.run_model(model = 't', ppc = True....) where in,

    if ppc:
        ppc_samples = pm.sample_posterior_predictive(trace, samples=samples, # import pymc3 as pm
                                    model=model, size=returns_test.shape[0], # len(returns_test)
                                    progressbar=progressbar)
        return trace, ppc_samples['returns']

Clearly (or maybe not), size should be wrong but doesn't look like it. I just replaced len(returns_test) with returns_test.shape[0] to see if it does something (it didn't). Now, I am at the end of my thinking capacity and hope someone else follows the trail that why this is happening (or maybe I am missing something trivial).

The full traceback:

ValueError                                Traceback (most recent call last)
<ipython-input-8-4a15f56a81e6> in <module>
      5                           cone_std = (1,2,3),
      6                           live_start_date=live_date,
----> 7                           benchmark_rets=benchmark,)

~\.conda\envs\adv_investment\lib\site-packages\pyfolio\tears.py in create_full_tear_sheet(returns, positions, transactions, market_data, benchmark_rets, slippage, live_start_date, sector_mappings, bayesian, round_trips, estimate_intraday, hide_positions, cone_std, bootstrap, unadjusted_returns, style_factor_panel, sectors, caps, shares_held, volumes, percentile, turnover_denom, set_context, factor_returns, factor_loadings, pos_in_dollars, header_rows, factor_partitions)
    256                                    live_start_date=live_start_date,
    257                                    benchmark_rets=benchmark_rets,
--> 258                                    set_context=set_context)
    259 
    260 

~\.conda\envs\adv_investment\lib\site-packages\pyfolio\plotting.py in call_w_context(*args, **kwargs)
     50         if set_context:
     51             with plotting_context(), axes_style():
---> 52                 return func(*args, **kwargs)
     53         else:
     54             return func(*args, **kwargs)

~\.conda\envs\adv_investment\lib\site-packages\pyfolio\tears.py in create_bayesian_tear_sheet(returns, benchmark_rets, live_start_date, samples, return_fig, stoch_vol, progressbar)
   1156     # Plot Bayesian cone
   1157     ax_cone = plt.subplot(gs[row, :])
-> 1158     bayesian.plot_bayes_cone(df_train, df_test, ppc_t, ax=ax_cone)
   1159     previous_time = timer("plotting Bayesian cone", previous_time)
   1160 

~\.conda\envs\adv_investment\lib\site-packages\pyfolio\bayesian.py in plot_bayes_cone(returns_train, returns_test, ppc, plot_train_len, ax)
    624         ppc,
    625         plot_train_len=plot_train_len,
--> 626         ax=ax)
    627     ax.text(
    628         0.40,

~\.conda\envs\adv_investment\lib\site-packages\pyfolio\bayesian.py in _plot_bayes_cone(returns_train, returns_test, preds, plot_train_len, ax)
    496     perc = compute_bayes_cone(preds, starting_value=returns_train_cum.iloc[-1])
    497     # Add indices
--> 498     perc = {k: pd.Series(v, index=returns_test.index) for k, v in perc.items()}
    499 
    500     returns_test_cum_rel = returns_test_cum

~\.conda\envs\adv_investment\lib\site-packages\pyfolio\bayesian.py in <dictcomp>(.0)
    496     perc = compute_bayes_cone(preds, starting_value=returns_train_cum.iloc[-1])
    497     # Add indices
--> 498     perc = {k: pd.Series(v, index=returns_test.index) for k, v in perc.items()}
    499 
    500     returns_test_cum_rel = returns_test_cum

~\.conda\envs\adv_investment\lib\site-packages\pandas\core\series.py in __init__(self, data, index, dtype, name, copy, fastpath)
    300                         raise ValueError(
    301                             "Length of passed values is {val}, "
--> 302                             "index implies {ind}".format(val=len(data), ind=len(index))
    303                         )
    304                 except TypeError:

ValueError: Length of passed values is 1105, index implies 101

Please provide any additional information below: Note that there are 1105 of in sample data and 101 out of sample data and if I change the shape of index of perc and the fill_between functions to returns_train.index, the sheet is generated so I think ppc_t needs to be generated for out of sample as compared to in sample data that it is doing now.

Versions

jlindbloom commented 3 years ago

I think this error has to do with a change made to PyMC3's sample_posterior_predictive method (formerly sample_ppc, there is currently an open PR to change this). This discourse article describes how the shape of the returned array now depends on the shape of the training data. My fix for this was to define a data container (see this notebook) for the training data, and then swap out the data in the container for testing data prior to calling sample_posterior_predictive.

Here's what I did specifically (all edits in bayesian.py):

  1. In the functions for each model (e.g., model_returns_t ), define a data container.

    with pm.Model() as model:
        data_container = pm.Data("data", data)
        mu = pm.Normal('mean returns', mu=0, sd=.01, testval=data.mean())
        sigma = pm.HalfCauchy('volatility', beta=1, testval=data.std())
        nu = pm.Exponential('nu_minus_two', 1. / 10., testval=3.)
    
        returns = pm.StudentT('returns', nu=nu + 2, mu=mu, sd=sigma,
                           observed=data_container)
        pm.Deterministic('annual volatility',
                         returns.distribution.variance**.5 * np.sqrt(252))
    
        pm.Deterministic('sharpe', returns.distribution.mean /
                         returns.distribution.variance**.5 *
                         np.sqrt(252))
    
        trace = pm.sample(samples)
  2. In run_model, change
    if ppc:
        ppc_samples = pm.sample_ppc(trace, samples=samples,
                                    model=model, size=len(returns_test))
        return trace, ppc_samples['returns']

    to

    if ppc:
        with model:
            pm.set_data({"data" : returns_test})
            ppc_samples = pm.sample_posterior_predictive(trace, samples=samples,
                                        model=model)
        return trace, ppc_samples['returns']

    I'm not sure if this is the best way to fix this, but it works for me!