Closed filipeteotonio closed 4 years ago
Hi Filipe,
Thanks for reaching out! Could you provide a sample of the code you are using? Or let me know if you have added any constraints and objectives.
Best, Robert
On Thu, 9 Apr 2020, 00:47 Filipe Mendonça, notifications@github.com wrote:
I've being using PyPortfolioOpt for a couple of days now and I'm finding it very interesting. However, it seems that for some combination of assets in my dataset, the max_sharpe method will not work. The error comes from cvxpy:
DCPError: Problem does not follow DCP rules. Specifically: The objective is not DCP. Its following subexpressions are not: QuadForm(var20890, [[1.28992571 0.05675357 0.23213086 0.04030597] [0.05675357 0.39647776 0.3670981 0.09351321] [0.23213086 0.3670981 0.37204403 0.19053016] [0.04030597 0.09351321 0.19053016 0.17553035]])
Any ideas on why this error would occurr?
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/robertmartin8/PyPortfolioOpt/issues/82, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACZ7ECNJ4WIGJTOBLPGK4ZTRLSTCBANCNFSM4MECMQYA .
oops, accidentally closed.
I was able to trigger this error with the example code in the readme:
import pandas as pd
from pypfopt import EfficientFrontier
from pypfopt import risk_models
from pypfopt import expected_returns
# Read in price data
df = pd.read_csv("stonks.csv", parse_dates=True, index_col="Date")
# Calculate expected returns and sample covariance
mu = expected_returns.mean_historical_return(df)
S = risk_models.sample_cov(df)
# Optimise for maximal Sharpe ratio
ef = EfficientFrontier(mu, S)
raw_weights = ef.max_sharpe()
cleaned_weights = ef.clean_weights()
ef.save_weights_to_file("weights.csv") # saves to file
print(cleaned_weights)
ef.portfolio_performance(verbose=True)
And this CSV file: stonks.zip
@as-com
Thanks so much! I'm working on it now. One of my preliminary (strange) discoveries is that the covariance matrix is not actually positive semidefinite, which explains why the optimiser is failing (it also fails for min_volatility
!).
# cov matrix from stonks.csv
S = np.array([[0.03818144, 0.04182824],
[0.04182824, 0.04149209]])
S.values.trace() # gives 0.0797
np.linalg.det(S) # gives -0.00016
For a 2x2 matrix, the matrix is positive semidefinite if and only if both the trace and determinant are ≥ 0.
For numerical stability, you should add a small amount of noise to the diagonal.
@jonathanng
Thanks for the suggestion! I'll look into it. But surely the noise isn't guaranteed to make it positive semidefinite?
@jonathanng
Thanks for the link, I'll give it a read. I'm also considering a simpler solution of just clipping the eigenvalues at zero then reconstructing the matrix (as described here). @schneiderfelipe do you have any thoughts on this? Your linear algebra skills are much better than mine 😂
Anyway, @filipeteotonio, I think the best solution for the time being is to use a different estimator of the covariance matrix. I'd suggest the following:
S = risk_models.CovarianceShrinkage(df).ledoit_wolf()
The following is a long reply.
I was able to trigger this error with the example code in the readme:
import pandas as pd from pypfopt import EfficientFrontier from pypfopt import risk_models from pypfopt import expected_returns # Read in price data df = pd.read_csv("stonks.csv", parse_dates=True, index_col="Date") # Calculate expected returns and sample covariance mu = expected_returns.mean_historical_return(df) S = risk_models.sample_cov(df) # Optimise for maximal Sharpe ratio ef = EfficientFrontier(mu, S) raw_weights = ef.max_sharpe() cleaned_weights = ef.clean_weights() ef.save_weights_to_file("weights.csv") # saves to file print(cleaned_weights) ef.portfolio_performance(verbose=True)
And this CSV file: stonks.zip
I made this code work with PyPortfolioOpt version 0.5.1. The calculated covariance matrix is the same:
➜ ~ ipython3
Python 3.6.9 (default, Nov 7 2019, 10:44:02)
Type 'copyright', 'credits' or 'license' for more information
IPython 7.8.0 -- An enhanced Interactive Python. Type '?' for help.
In [1]: import pandas as pd
...: from pypfopt.efficient_frontier import EfficientFrontier
...: from pypfopt import risk_models
...: from pypfopt import expected_returns
...:
...: # Read in price data
...: df = pd.read_csv("stonks.csv", parse_dates=True, index_col="Date")
...:
...: # Calculate expected returns and sample covariance
...: mu = expected_returns.mean_historical_return(df)
...: S_ = risk_models.sample_cov(df)
...:
...: # Optimise for maximal Sharpe ratio
...: ef = EfficientFrontier(mu, S_)
...: raw_weights = ef.max_sharpe()
...: cleaned_weights = ef.clean_weights()
...: ef.save_weights_to_file("weights.csv") # saves to file
...: print(cleaned_weights)
...: ef.portfolio_performance(verbose=True)
{'VTI': 0.0, 'MGK': 1.0}
Expected annual return: 11.4%
Annual volatility: 20.4%
Sharpe Ratio: 0.46
Out[1]: (0.11443438934256847, 0.20369607211761417, 0.46360437077079236)
In [2]: S_
Out[2]:
VTI MGK
VTI 0.038181 0.041828
MGK 0.041828 0.041492
And not positive semidefinite, as observed by @robertmartin8 (the first eigenvalue, -0.00202422, is less than zero):
In [3]: import numpy as np
In [4]: np.linalg.eigh(S_)
Out[4]:
(array([-0.00202422, 0.08169775]), array([[-0.72095193, 0.69298507],
[ 0.69298507, 0.72095193]]))
I could reproduce the error with version 1.0.2 as well. I did all this to have something to compare.
risk_models.sample_cov
might produce a matrix with negative eigenvalues if one of them is close to zero (so that it randomly flips sign) or, as suggested here, if some data are missing. There are two ways we might go around this:
S_
(of the order of the smallest eigenvalue), as suggested by @jonathanng. That is the same as adding a (diagonal) matrix proportional to the identity (method="diag"
).method="spectral"
).I implemented both methods in the function below:
import numpy as np
def sample_cov(prices, frequency=252, method=None):
returns = expected_returns.returns_from_prices(prices)
S = returns.cov()
if method is not None:
w, U = np.linalg.eigh(S)
if method == "diag":
mw = np.min(w)
if mw < 0:
S -= mw * np.eye(len(S))
elif method == "spectral":
w = np.where(w > 0, w, 0)
S = U @ np.diag(w) @ U.T
return S * frequency
method="diag"
In [5]: import pandas as pd
...: from pypfopt.efficient_frontier import EfficientFrontier
...: from pypfopt import risk_models
...: from pypfopt import expected_returns
...:
...: # Read in price data
...: df = pd.read_csv("stonks.csv", parse_dates=True, index_col="Date")
...:
...: # Calculate expected returns and sample covariance
...: mu = expected_returns.mean_historical_return(df)
...: S = sample_cov(df, method="diag")
...:
...: # Optimise for maximal Sharpe ratio
...: ef = EfficientFrontier(mu, S)
...: raw_weights = ef.max_sharpe()
...: cleaned_weights = ef.clean_weights()
...: ef.save_weights_to_file("weights.csv") # saves to file
...: print(cleaned_weights)
...: ef.portfolio_performance(verbose=True)
{'VTI': 0.0, 'MGK': 1.0}
Expected annual return: 11.4%
Annual volatility: 20.9%
Sharpe Ratio: 0.45
Out[5]: (0.11443438934256848, 0.20860562383796466, 0.4526934010941182)
In [6]: S
Out[6]:
VTI MGK
VTI 0.040206 0.041828
MGK 0.041828 0.043516
In [7]: S - S_
Out[7]:
VTI MGK
VTI 0.002024 0.000000
MGK 0.000000 0.002024
In [8]: np.linalg.norm(S - S_)
Out[8]: 0.0028626744284440757
Some remarks:
method="diag"
adds a small value to each individual variance but doesn't change covariances. Each asset is equally treated.method="diag"
is not "optimal" in the sense that it is not the positive (semi)definite closest to S_
.method="spectral"
In [9]: import pandas as pd
...: from pypfopt.efficient_frontier import EfficientFrontier
...: from pypfopt import risk_models
...: from pypfopt import expected_returns
...:
...: # Read in price data
...: df = pd.read_csv("stonks.csv", parse_dates=True, index_col="Date")
...:
...: # Calculate expected returns and sample covariance
...: mu = expected_returns.mean_historical_return(df)
...: S = sample_cov(df, method="spectral")
...:
...: # Optimise for maximal Sharpe ratio
...: ef = EfficientFrontier(mu, S)
...: raw_weights = ef.max_sharpe()
...: cleaned_weights = ef.clean_weights()
...: ef.save_weights_to_file("weights.csv") # saves to file
...: print(cleaned_weights)
...: ef.portfolio_performance(verbose=True)
{'VTI': 0.0, 'MGK': 1.0}
Expected annual return: 11.4%
Annual volatility: 20.6%
Sharpe Ratio: 0.46
Out[9]: (0.11443438934256848, 0.20606837668700312, 0.4582672550772053)
In [10]: S
Out[10]:
array([[0.03923357, 0.04081692],
[0.04081692, 0.04246418]])
In [11]: S - S_
Out[11]:
VTI MGK
VTI 0.001052 -0.001011
MGK -0.001011 0.000972
In [12]: np.linalg.norm(S - S_)
Out[12]: 0.002024216500682136
Some remarks:
method="diag"
adds a smaller value to each individual variance and subtracts from each covariance. Each asset is potentially unequally treated.method="diag"
is "optimal" in the sense that it is the positive (semi)definite closest to S_
, as seen from the Frobenius norm (we can't get closer than the differences in the trace, which is the difference in eigenvalues, hence optimal).I think both methods deserve attention, as method="diag"
is conservative (it homogenously increases risk only) and interesting in this respect from an investment point of view. But method="spectral"
is mathematically optimal (and the way to go, if you trust your data, in my opinion).
@schneiderfelipe
Thanks for the excellent work! My thoughts are as follows, let me know what you think!
Firstly, I think this is quite an advanced topic so I don't want the user to have to understand too much about what's going on. Thus in most cases, I think this fixing process should happen automatically on the backend, without the user having to explicitly choose a fixing method.
My plan is to define a new function fix_nonpositive_semidefinite(matrix, fix_method="spectral")
in risk_models
, which checks if the input matrix is positive semidefinite and fixes it if not. This method will be called at the end of all the other risk models (e.g return fix_nonpositive_semidefinite(cov)
), though I will raise a warning if the original matrix isn't PSD just so the user is aware. Will probably have to provide a kwarg in all the functions so the choice of method can get passed down.
As for the default method, based on your analysis I think spectral
makes more sense. In particular, I don't think it matters that assets are equally/unequally treated. e.g if I have two assets with variances 0.05 and 0.15, then subtract 0.04 from both ('equal'), the result is assets with 0.01 and 0.11. Which somehow feels very imbalanced anyway.
On another note, I've decided to adopt your suggestion from issue #3, i.e having a risk_matrix(method="sample_cov")
master function. I don't plan on deprecating the other API just yet.
To get the diag
method to work, I had to subtract a little bit more than the min eigenvalue (otherwise I think there are floating point issues around zero).
fixed_matrix = matrix - 1.1 * min_eig * np.eye(len(matrix))
To get the
diag
method to work, I had to subtract a little bit more than the min eigenvalue (otherwise I think there are floating point issues around zero).fixed_matrix = matrix - 1.1 * min_eig * np.eye(len(matrix))
Interesting, does not doing that brake cvxpy? I tested this, and indeed, the corrected matrix ends up having a small negative eigenvalue, but it is so small (~1e-18) that cvxpy didn't complain.
@schneiderfelipe
I didn't test it with cvxpy, but my _is_positive_semidefinite
method (which attempts a Cholesky factorisation) returns False, so it messes up the error checking workflow. I figure multiplying by 1.1 is a relatively harmless way of amending this.
@robertmartin8
I didn't test it with cvxpy, but my
_is_positive_semidefinite
method (which attempts a Cholesky factorisation) returns False, so it messes up the error checking workflow. I figure multiplying by 1.1 is a relatively harmless way of amending this.
Cholesky decomposition works for positive definite matrices only (all eigenvalues strictly greater than zero), so it fails for a matrix with a zero eigenvalue:
In [1]: import numpy as np
In [2]: np.linalg.cholesky(np.diag([0, 1, 2]))
---------------------------------------------------------------------------
LinAlgError Traceback (most recent call last)
<ipython-input-4-9739668864ab> in <module>
----> 1 np.linalg.cholesky(np.diag(w))
<__array_function__ internals> in cholesky(*args, **kwargs)
/usr/local/lib/python3.6/dist-packages/numpy/linalg/linalg.py in cholesky(a)
753 t, result_t = _commonType(a)
754 signature = 'D->D' if isComplexType(t) else 'd->d'
--> 755 r = gufunc(a, signature=signature, extobj=extobj)
756 return wrap(r.astype(result_t, copy=False))
757
/usr/local/lib/python3.6/dist-packages/numpy/linalg/linalg.py in _raise_linalgerror_nonposdef(err, flag)
98
99 def _raise_linalgerror_nonposdef(err, flag):
--> 100 raise LinAlgError("Matrix is not positive definite")
101
102 def _raise_linalgerror_eigenvalues_nonconvergence(err, flag):
LinAlgError: Matrix is not positive definite
So, attempting Cholesky decomposition actually checks for strict positive definiteness. You might check for positive semidefiniteness by the following:
def _is_positive_semidefinite(a):
w = np.linalg.eigvalsh(a)
min_eig = np.min(w)
return min_eig > 0 or np.isclose(min_eig, 0)
This ensures really small negative numbers are considered zero:
In [4]: _is_positive_semidefinite(np.diag([0, 1, 2]))
Out[4]: True
@robertmartin8 I've been inactive for the last week so that's why I didn't reply but it is so nice to see that the discussion went forward and the fix is already merged! Thank you guys.
I've being using PyPortfolioOpt for a couple of days now and I'm finding it very interesting. However, it seems that for some combination of assets in my dataset, the
max_sharpe
method will not work. The error comes fromcvxpy
:Any ideas on why this error would occurr?