robertmartin8 / PyPortfolioOpt

Financial portfolio optimisation in python, including classical efficient frontier, Black-Litterman, Hierarchical Risk Parity
https://pyportfolioopt.readthedocs.io/
MIT License
4.38k stars 940 forks source link

Issue with small numbers/Scientific Notations from yfinance. #387

Closed Originn closed 2 years ago

Originn commented 2 years ago

Among my symbols in the yf.download there is a symbol (SHIB-USD) which is worth 0.00005 usd.

Yf will download its info as a df with Scientific Notations, so converting it with pd.set_option('display.float_format', lambda x: '%.7f' % x)

Although the step above, I get the following output: /home/ubuntu/.local/lib/python3.9/site-packages/sklearn/covariance/_shrunk_covariance.py:243: RuntimeWarning: overflow encountered in square X2 = X ** 2 /home/ubuntu/.local/lib/python3.9/site-packages/sklearn/covariance/_shrunk_covariance.py:263: RuntimeWarning: overflow encountered in square np.dot(X.T[block_size * n_splits :], X[:, block_size * n_splits :]) ** 2 /home/ubuntu/.local/lib/python3.9/site-packages/sklearn/covariance/_shrunk_covariance.py:270: RuntimeWarning: invalid value encountered in double_scalars beta = 1.0 / (n_features * n_samples) * (beta_ / n_samples - delta_) /home/ubuntu/.local/lib/python3.9/site-packages/sklearn/covariance/_shrunk_covariance.py:272: RuntimeWarning: invalid value encountered in double_scalars delta = delta_ - 2.0 * mu * emp_cov_trace.sum() + n_features * mu ** 2 DEBUG: locator: <matplotlib.colorbar._ColorbarAutoLocator object at 0x7f4c1ae1ebb0> DEBUG: Using auto colorbar locator <matplotlib.colorbar._ColorbarAutoLocator object at 0x7f4c1ae1ebb0> on colorbar DEBUG: Setting pcolormesh /home/ubuntu/.local/lib/python3.9/site-packages/sklearn/covariance/_shrunk_covariance.py:243: RuntimeWarning: overflow encountered in square X2 = X ** 2 /home/ubuntu/.local/lib/python3.9/site-packages/sklearn/covariance/_shrunk_covariance.py:263: RuntimeWarning: overflow encountered in square np.dot(X.T[block_size * n_splits :], X[:, block_size * n_splits :]) ** 2 /home/ubuntu/.local/lib/python3.9/site-packages/sklearn/covariance/_shrunk_covariance.py:270: RuntimeWarning: invalid value encountered in double_scalars beta = 1.0 / (n_features * n_samples) * (beta_ / n_samples - delta_) /home/ubuntu/.local/lib/python3.9/site-packages/sklearn/covariance/_shrunk_covariance.py:272: RuntimeWarning: invalid value encountered in double_scalars delta = delta_ - 2.0 * mu * emp_cov_trace.sum() + n_features * mu ** 2

And get ValueError: P must be symmetric/Hermitian.

I expirence this issue only with this specific symbol, I guess because it is a small number. Is there a way to include this symbol in the df without issues or I'll have to remove it?

Will appriciate your help!

robertmartin8 commented 2 years ago

Could you compute the returns and check it’s ok? (I.e no infinities)

Originn commented 2 years ago

I checked running - expected_returns.capm_return(prices) and indeed I get inf result for all tickers. This does not happen if SHIB-USD in not on the symbol list.

robertmartin8 commented 2 years ago

Are there any zeroes in prices["SHIB-USD"] or is it just that the numbers are tiny?

If it's the latter, you could try multiplying that column by a large number before passing to PyPortfolioOpt (this will not affect the expected_returns), e.g

prices["SHIB-USD"] *= 1e5
Originn commented 2 years ago

my current df: HEX-USD SHIB-USD STORJ-USD UNI3-USD Date 2017-07-02 NaN NaN 0.548470 NaN 2017-07-03 NaN NaN 0.813070 NaN 2017-07-04 NaN NaN 0.794665 NaN 2017-07-05 NaN NaN 0.659326 NaN 2017-07-06 NaN NaN 0.651197 NaN ... ... ... ... ... 2021-10-24 0.325211 0.000036 1.280212 25.837254 2021-10-25 0.300179 0.000042 1.325330 26.705355 2021-10-26 0.279826 0.000048 1.304164 26.557507 2021-10-27 0.253309 0.000080 1.137098 24.154032 2021-10-28 0.247258 0.000067 1.131854 24.233309

Now the df does not have the scientific notations but still the same error.

Multiplying that column by a large number will change the allocation output? I am trying to avoid that if possible:)

Originn commented 2 years ago

I have checked sklearn.covariance.ShrunkCovariance https://scikit-learn.org/stable/modules/generated/sklearn.covariance.ShrunkCovariance.html

Wanted to test if assume_centered=True have any difference but not sure where and how to implement it.

robertmartin8 commented 2 years ago

Multiplying prices shouldn’t affect the allocation - the allocation only depends on returns.

do you get any infinities if you do ‘pct_change’ on the SHIB data?

Originn commented 2 years ago

Yes after applying pct_change() to the SHIB data, still get inf values on all tickers. Error on S = risk_models.CovarianceShrinkage(prices).ledoit_wolf() plotting.plot_covariance(S, plot_correlation=True);

/usr/local/lib/python3.7/dist-packages/numpy/core/_methods.py:160: RuntimeWarning: overflow encountered in reduce ret = umr_sum(arr, axis, dtype, out, keepdims) /usr/local/lib/python3.7/dist-packages/numpy/core/_methods.py:160: RuntimeWarning: invalid value encountered in reduce ret = umr_sum(arr, axis, dtype, out, keepdims)

Originn commented 2 years ago

I did some other tests and the reason for the issue was that a chunck of data from yf.download df was missing values (between certian dates). Starting the download from the date where the is consistent price data solved it for me. Thanks for the help!

BradKML commented 1 year ago

A bit of side question for here and https://github.com/robertmartin8/PyPortfolioOpt/issues/445 are there any way of making a tutorial of using yfinance with PyPortfolioOpt?