hudson-and-thames / mlfinlab

MlFinLab helps portfolio managers and traders who want to leverage the power of machine learning by providing reproducible, interpretable, and easy to use tools.
Other
3.95k stars 1.15k forks source link

HCAA - If I include covariance and expected_asset_returns, should I need to include asset_returns / asset_prices? #321

Closed kchima closed 4 years ago

kchima commented 4 years ago

Describe the bug

I attempt to call the HCAA (HierarchicalClusteringAssetAllocation) function like this: hcaa.allocate( asset_names=self.assets.columns.values, allocation_metric='sharpe_ratio', expected_asset_returns=mu, covariance_matrix=S )

I get this error: File "/Users/keithchima/venv/qstrader3/lib/python3.7/site-packages/mlfinlab/portfolio_optimization/hcaa.py", line 99, in allocate asset_returns = self.returns_estimator.calculate_returns(asset_prices=asset_prices, resample_by=resample_by) File "/Users/keithchima/venv/qstrader3/lib/python3.7/site-packages/mlfinlab/portfolio_optimization/returns_estimators.py", line 64, in calculate_returns asset_returns = asset_prices.pct_change() AttributeError: 'NoneType' object has no attribute 'pct_change'

because I didn't include asset_returns or asset_prices (I want it to use the covariance matrix for clustering and use my predicted prices in conjunction with clusters to determine allocation). Should I need to? Based on the self._perform_checks at the start of the function, it seems like this should be a valid way to call the function.

On a side note, some example code would be helpful. There is not much documentation on the proper ways to invoke the functions without diving into the source code.

I really appreciate the library, though, if I can get it working, I think it will be exactly the thing I need!

Thank you for your help.

aditya1702 commented 4 years ago

@kchima Thank you for opening this issue/potential bug. Yes, I intended to the parameters to be used the way you have done i.e. users can pass their own covariance matrices and asset returns or just pass prices.

And yes, the _perform_checks() checks this same thing. I will check this out and push a fix ASAP if this is a bug.

Regarding documentation - I know its lacking currently especially the HRP and HCAA parts of the docs. I am actively working on adding improved documentation for it (along with the description on how to use it). Plus, we are working on adding some tutorial jupyter notebooks and blog posts on how to use the different functionalities in the HCAA class and other portfolio optimisation features.

Thank you for appreciation!

aditya1702 commented 4 years ago

@kchima So I figured out why you are getting the error. Sharpe ratio calculation requires expected_asset_returns and not asset_returns. There is a little difference between the two

asset_returns: This parameter refers to the returns dataframe/matrix for each asset indexed by date. This parameter is required for calculating the optimal number of clusters in the hierarchical clustering tree. Or, you can just specify the number of clusters if you do not want the algo to run the optimal clustering algo.

expected_asset_returns: This parameter refers to the expected asset return values (parameter mu used in portfolio optimisation lingo). Thus, for 10 assets in your portfolio, you will have one value per asset for the expected returns. This is why it is throwing an error. Note that this parameter is only required if you are using the sharpe_ratio as your allocation_metric.

Hope this makes sense. I understand this needs to be specified in the documentation. I will update it properly soon. Stay tuned for the updates. In the meantime, feel free to post the required questions here :)

kchima commented 4 years ago

So my main goal is to provide 1) a single point of data for expected_asset_returns and 2) the covariance matrix in order to get an asset allocation. I could provide the full asset returns, but I am using this piece of code in a daily backtester and it becomes prohibitively slow if I run allocate for every historical data point.

If I specify a number of clusters or pre-calculate it, can I get away without passing the asset_returns? Maybe I could also use a different metric than Sharpe Ratio (although that wouldn't be ideal)?

Thank you again!

EDIT: As an update, and following your suggestion, it seems that if I include the "optimal_num_clusters" as a param, it goes significantly faster, and I could probably use it to backtest. What do you think about adding "optimal_num_clusters" to self, so you can instantiate HCAA once, train it so that "self.optimal_num_clusters" is calculated, and then you can pass it back in for future iterations of the algo? Although it does seem to change from time to time...

aditya1702 commented 4 years ago

@kchima Yes, you can just specify the optimal number of clusters. However, the calculation of the optimal number of clusters is part of the steps of the HCAA algorithm and since it can change with data, I do not run it beforehand (when the class object gets defined). All inputs to the algo are through the allocate() method, so it always runs when allocate() is called. This is also why I provided the option to specify the value as a parameter, if you know that for your data, the number of clusters will be in a certain range then you just specify it as a parameter.

Currently, the optimal number of clusters is calculated using Gap index which is a slow algorithm and that is why it takes time. I will be adding an option to use other methods to arrive at the number of clusters which are faster than the Gap method (Silhouette Index).

The asset returns dataframe is only required for "expected_shortfall" and "conditional_drawdown_risk". However, there was a bug in the code due to which it was always required. I have already fixed this in my PR which will be merged soon! After it gets merged, you will be able to just use your expected_asset_returns list and covariance matrix and you should be fine to go!

kchima commented 4 years ago

Thank you very much for your feedback! I might be able to get away with clustering only once every few times or so, but I look forward to the fast clustering algorithms! :)

aditya1702 commented 4 years ago

@kchima Awesome! Glad to know it is resolved. I will merge the fix soon and then you can use the allocate() method without passing asset_returns (for sharpe ratio metric)

kchima commented 4 years ago

Hey, not to resurrect this thread, but you mentioned:

I have already fixed this in my PR which will be merged soon! After it gets merged, you will be able to just use your expected_asset_returns list and covariance matrix and you should be fine to go!

And that you would also be adding the silhouette index, which is faster than the Gap index.

Have these been implemented / merged in yet? I had some thoughts and was planning on revisiting HRP / HCAA soon!

Thanks very much :) -Keith

aditya1702 commented 4 years ago

@kchima Currently, we are focussing on improving the documentation for portfolio optimisation. As soon as that is finished, I will add the new methods.

kchima commented 4 years ago

Thank you! What is the best way for me to know if/when they've been added?