robertmartin8 / PyPortfolioOpt

Financial portfolio optimisation in python, including classical efficient frontier, Black-Litterman, Hierarchical Risk Parity
https://pyportfolioopt.readthedocs.io/
MIT License
4.39k stars 940 forks source link

Hierarchical Risk Parity with Covariance matrix input #328

Closed ghost closed 3 years ago

ghost commented 3 years ago

I am trying to understand why the HRP optimizer doesn't give different results with different covariance matrices.

Here a simple sample:


import yfinance as yf
import pandas as pd

from functools import reduce
from pypfopt import risk_models, expected_returns, HRPOpt

tickers = ["MSFT", "AMZN", "KO", "MA", "COST"]
prices = yf.download(tickers, period="max")["Adj Close"].dropna(how="all")

rets = expected_returns.returns_from_prices(prices)

cov_options = ["sample_cov", "semicovariance", "exp_cov", "ledoit_wolf", 
               "ledoit_wolf_constant_variance",
               "ledoit_wolf_single_factor",  
               "ledoit_wolf_constant_correlation",
               "oracle_approximating"]

S_list = []
for meth in cov_options:
    S_temp = risk_models.risk_matrix(rets, method= meth, returns_data = True)
    S_list.append(S_temp)

# HRP 
portfolio_mean_var_list = []
i =0
for S in S_list:
    i =i + 1
    ef = HRPOpt(rets,  cov_matrix = S)
    raw_weights = ef.optimize()
    cleaned_weights = ef.clean_weights()
    cleaned_weights = pd.DataFrame(cleaned_weights , index=[0])
    cleaned_weights = pd.melt(cleaned_weights, var_name = 'fund', value_name = 'weight_'+cov_options[i-1] )
    portfolio_mean_var_list.append(cleaned_weights)

portfolio_HRP= reduce(lambda  left,right: pd.merge(left,right,on=['fund'],
                                            how='left'), portfolio_mean_var_list)

I get the same portfolios for all possible covariance options.

robertmartin8 commented 3 years ago

Loosely speaking, HRP works in two steps. 1) hierarchically cluster the assets, using a distance matrix calculated from the covariance matrix. 2) build a bottom-up portfolio by locally minimising variance.

In step 1), while the distances will be different depending on which cov matrix you use, the overall clustering might not be different. This is because the clustering is based on the relative distance between assets – when you use a different cov matrix, the covariances change, but the relative distance between assets might not change significantly enough for the clustering to be affected.

Step 2) does not depend on the cov matrix – only its diagonal.

Hope this helps!

ghost commented 3 years ago

Then this shows a limitation to the HRP if the distance matrix is "covariance invariant" -- but cannot think what the implications here might be or if it really matters.

However, I think is worth flagging this out in your documentation. A lot of users might have the same impression as me.

robertmartin8 commented 3 years ago

I don't think it's a limitation – if you constructed a covariance matrix that was suitably different, the clustering would probably be different too. In some sense, it's a strength that different calculation methods give the same result – it speaks of robustness to the inputs.

ghost commented 3 years ago

Thinking a little bit more about it. Looks that HRP is robust to backward-looking covariance matrices which primarily capture linear dependence. But, then a cool feature might be to have in the Risk Models a method that computes non-linear dependence too or some more forward-looking covariance like the L-B approach. Is possible to extract the covariance matrix from L-B with beliefs and pass it to the HRP?

phschiele commented 3 years ago

@msh855 This paper might be interesting to you: Estimation of Theory-Implied Correlation Matrices, Marcos Lopez de Prado(2019).

ghost commented 3 years ago

@phschiele Thanks. Now that went through the paper I remembered that De Prado has a variation with a tree-based method to construct the covariance matrix. so my question is to @robertmartin8 is this the default method here as well?

phschiele commented 3 years ago

@msh855 In the paper, additional information in form of GICS classifications is used. This might make it a bit hard to use it as a default. Perhaps (if licensing allows it), one could add the TIC algorithm to help users derive covariance estimates from custom knowledge graphs.

robertmartin8 commented 3 years ago

@msh855

forward-looking covariance like the L-B approach.

What is L-B? Could you share a paper/reference?

TIC is a very cool algorithm. I'm definitely open to accepting PRs on it, but it's not one of my priorities at the moment – I think there are still many "classical" methods that should be implemented, like risk parity and factor modelling, before building some of the more novel methods.

ghost commented 3 years ago

@robertmartin8 Sorry, by L-B I actually wanted to say Black-Litterman allocation. I was wondering if it is possible to get the covariance matrix with the updates (views) to use in HRP optimization.

robertmartin8 commented 3 years ago

@msh855 You can use B-L to provide views and get a posterior expected returns vector or cov matrix. The documentation for the B-L covariance matrix is here.

However, as you've seen with your initial experiments, this may not affect the tree structure.

ghost commented 3 years ago

Thanks. Will check out. But, yes you are right. Since B-L still assumes linear dependence (correct?) results shouldn't change.