statsmodels / statsmodels

Statsmodels: statistical modeling and econometrics in Python
http://www.statsmodels.org/devel/
BSD 3-Clause "New" or "Revised" License
10.19k stars 2.98k forks source link

MarkovRegression gives SVD did not converge #8416

Open ARBQuant opened 2 years ago

ARBQuant commented 2 years ago

Describe the bug

Hi, I am using the Markov Regime Switching algorithms implemented in statsmodels to make inference about the regimes in my data. I have learnt from the examples given by the MarkovRegression and the MarkovAutoregression ones, and I have also read the paper by Hamilton 1989 and 1990. I have now 2 time series, and I have taken the log form of both, each series have 100 numbers without nan of inf, and I can run the algorithm with the differenced data, but not the original log data, as the log data without differencing always give me the SVD does not converge error.

Code Sample, a copy-pastable example if possible

my data frame that is already in the log form outputs to this, and I also attached a data file with the same numbers just for convenience. VariableA VariableB 07/02/1999 7.248433 7.773384 07/09/1999 7.253966 7.790489 07/16/1999 7.263680 7.813592 07/23/1999 7.217297 7.744938 07/30/1999 7.194287 7.730394 08/06/1999 7.175490 7.706163 08/13/1999 7.195112 7.748460 08/20/1999 7.200574 7.753409 08/27/1999 7.209118 7.785721 09/03/1999 7.185576 7.783224 09/10/1999 7.222274 7.849909 09/17/1999 7.208970 7.852828 09/24/1999 7.161234 7.793587 10/01/1999 7.165107 7.797497 10/08/1999 7.205264 7.855738 10/15/1999 7.136722 7.791006 10/22/1999 7.175643 7.822645 10/29/1999 7.227118 7.883823 11/05/1999 7.233094 7.931285 11/12/1999 7.245298 7.975393 11/19/1999 7.263680 8.021749 11/26/1999 7.254531 8.045909 12/03/1999 7.271356 8.062905 12/10/1999 7.268920 8.087948 12/17/1999 7.275519 8.135640 12/23/1999 7.284478 8.194091 12/31/1999 7.302665 8.230577 01/07/2000 7.286534 8.176813 01/14/2000 7.298445 8.229911 01/21/2000 7.281902 8.262688 01/28/2000 7.220008 8.153350 02/04/2000 7.266478 8.270397 02/11/2000 7.241903 8.292173 02/18/2000 7.210264 8.286017 02/25/2000 7.198931 8.341410 03/03/2000 7.251877 8.401670 03/10/2000 7.256826 8.448272 03/17/2000 7.305860 8.415493 03/24/2000 7.349552 8.477204 03/31/2000 7.323336 8.402343 04/07/2000 7.332369 8.374938 04/14/2000 7.220740 8.076360 04/20/2000 7.277420 8.170469 04/28/2000 7.286192 8.245384 05/05/2000 7.271704 8.214736 05/12/2000 7.265919 8.136665 05/19/2000 7.254885 8.095599 05/26/2000 7.230744 8.048788 06/02/2000 7.297091 8.229911 06/09/2000 7.302833 8.246434 06/16/2000 7.305188 8.249967 06/23/2000 7.288586 8.223627 06/30/2000 7.291656 8.247220 07/07/2000 7.309714 8.262946 07/14/2000 7.329094 8.313362 07/21/2000 7.306196 8.278682 07/28/2000 7.266827 8.162231 08/04/2000 7.294207 8.197539 08/11/2000 7.298783 8.206174 08/18/2000 7.313720 8.252055 08/25/2000 7.322180 8.283747 09/01/2000 7.328601 8.323123 09/08/2000 7.325808 8.261139 09/15/2000 7.302159 8.221076 09/22/2000 7.291997 8.236421 09/29/2000 7.281902 8.194506 10/06/2000 7.263154 8.118952 10/13/2000 7.233816 8.094073 10/20/2000 7.252054 8.160375 10/27/2000 7.244942 8.077913 11/03/2000 7.269791 8.117611 11/10/2000 7.224389 7.970395 11/17/2000 7.222931 7.985995 11/24/2000 7.205264 7.947679 12/01/2000 7.185766 7.852633 12/08/2000 7.212663 7.909306 12/15/2000 7.189922 7.864228 12/22/2000 7.190488 7.811568 12/29/2000 7.196687 7.772542 01/05/2001 7.173575 7.737834 01/12/2001 7.193122 7.839919 01/19/2001 7.206748 7.895995 01/26/2001 7.217627 7.880615 02/02/2001 7.210264 7.819435 02/09/2001 7.186523 7.730614 02/16/2001 7.174341 7.708635 02/23/2001 7.128897 7.629247 03/02/2001 7.119029 7.537164 03/09/2001 7.126087 7.516161 03/16/2001 7.057683 7.422374 03/23/2001 7.050556 7.458186 03/30/2001 7.064118 7.371489 04/06/2001 7.039003 7.293018 04/12/2001 7.083598 7.455877 04/20/2001 7.131499 7.577378 04/27/2001 7.137676 7.508239 05/04/2001 7.149328 7.570701 05/11/2001 7.140057 7.518879 05/18/2001 7.166652 7.567346 05/25/2001 7.153834 7.585027 06/01/2001 7.142827 7.523751

And the code I am using to run this:

import pandas as pd import statsmodels.api as sm

the above data frame is named as logdata

model = sm.tsa.MarkovRegression(endog=logdata['VariableA'], k_regimes=2, exog=logdata['VariableB'], switching_exog=True, switching_variance=True) res = model.fit(maxiter=100000)

Expected Output

Assuming that I use the logreturns = (logdata - logdata.shift(1)).dropna(), and then use logreturns['VariableA'] as endogenous and use logreturns['VariableB'] as exogenous variables, I have tested many more data like this in my full data set, and most of the time I can get the fitted model and all the properties. However, when using logdata, I always get these error messages:

File "C:..\markovSwitching.py", line 14, in MSDR res = model.fit(maxiter=maxiter) File "C:\Users..\AppData\Local\Programs\Python\Python310\lib\site-packages\statsmodels\tsa\regime_switching\markov_switching.py", line 1113, in fit start_params = self._fit_em(start_params, transformed=transformed, File "C:\Users..\AppData\Local\Programs\Python\Python310\lib\site-packages\statsmodels\tsa\regime_switching\markov_switching.py", line 1205, in _fit_em out = self._em_iteration(params[-1]) File "C:\Users..\AppData\Local\Programs\Python\Python310\lib\site-packages\statsmodels\tsa\regime_switching\markov_regression.py", line 214, in _em_iteration coeffs = self._em_exog(result, self.endog, self.exog, File "C:\Users..\AppData\Local\Programs\Python\Python310\lib\site-packages\statsmodels\tsa\regime_switching\markov_regression.py", line 250, in _em_exog np.dot(np.linalg.pinv(tmp_exog), tmp_endog)) File "<__array_function__ internals>", line 180, in pinv File "C:\Users..\AppData\Local\Programs\Python\Python310\lib\site-packages\numpy\linalg\linalg.py", line 1998, in pinv u, s, vt = svd(a, full_matrices=False, hermitian=hermitian) File "<__array_function__ internals>", line 180, in svd File "C:\Users..\AppData\Local\Programs\Python\Python310\lib\site-packages\numpy\linalg\linalg.py", line 1657, in svd u, s, vh = gufunc(a, signature=signature, extobj=extobj) File "C:\Users..\AppData\Local\Programs\Python\Python310\lib\site-packages\numpy\linalg\linalg.py", line 98, in _raise_linalgerror_svd_nonconvergence raise LinAlgError("SVD did not converge") numpy.linalg.LinAlgError: SVD did not converge

Output of import statsmodels.api as sm; sm.show_versions()

INSTALLED VERSIONS

Python: 3.10.5.final.0

statsmodels

Installed: 0.13.2 (C:\Users..\AppData\Local\Programs\Python\Python310\lib\site-packages\statsmodels)

Required Dependencies

cython: Not installed numpy: 1.23.0 (C:\Users..\AppData\Local\Programs\Python\Python310\lib\site-packages\numpy) scipy: 1.8.1 (C:\Users..\AppData\Local\Programs\Python\Python310\lib\site-packages\scipy) pandas: 1.4.3 (C:\Users..\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas) dateutil: 2.8.2 (C:\Users..\AppData\Local\Programs\Python\Python310\lib\site-packages\dateutil) patsy: 0.5.2 (C:\Users..\AppData\Local\Programs\Python\Python310\lib\site-packages\patsy)

Optional Dependencies

matplotlib: 3.5.2 (C:\Users..\AppData\Local\Programs\Python\Python310\lib\site-packages\matplotlib) backend: TkAgg cvxopt: Not installed joblib: Not installed Log.xlsx

Developer Tools

IPython: Not installed jinja2: 3.1.2 (C:\Users..\AppData\Local\Programs\Python\Python310\lib\site-packages\jinja2) sphinx: Not installed pygments: Not installed pytest: 7.1.3 (C:\Users..\AppData\Local\Programs\Python\Python310\lib\site-packages\pytest) virtualenv: Not installed

Thank you very much in advance for all of your help!!

ARBQuant commented 2 years ago

Does the Markov Regression functions here assume that my data input, at least need to be stationary in each of the potential regimes? Is this the reason why I get good fitted model with the differenced data but not the log prices?

Lucidatrix commented 2 years ago

Did you find a solution? I am experienxing a similar issue with the MarkovRegression

ARBQuant commented 2 years ago

Did you find a solution? I am experienxing a similar issue with the MarkovRegression

Unfortunately, no. I tried different types of input data, and usually when the data itself is stationary, or the two series are cointegrated for the endo and exog case, I would not run into that issue. However, this is somehow different from the cases I pictured to apply this algorithm.

mboldin commented 2 years ago

The problem may be related to a non-stationary issue with the data and model design, but errors such as

raise LinAlgError("SVD did not converge") numpy.linalg.LinAlgError: SVD did not converge

are typically due to redundant or unidentified parameters (at the parameter space where the optimization procedures were trying to converge).

You could try different starting values or a different optimization method that does not use have a SVD step

It is most likely a case where the log level model design makes it hard to find 2 distinct regimes. This can come from a case where one regime fits the data almost perfectly, or from a case where neither regime fits the data well, or just a bad choice in starting values. Probably not a software bug.

On Fri, Sep 23, 2022 at 12:04 PM ARBQuant @.***> wrote:

Does the Markov Regression functions here assume that my data input, at least need to be stationary in each of the potential regimes? Is this the reason why I get good fitted model with the differenced data but not the log prices?

— Reply to this email directly, view it on GitHub https://github.com/statsmodels/statsmodels/issues/8416#issuecomment-1256396071, or unsubscribe https://github.com/notifications/unsubscribe-auth/AASXOMV3AEWWIM6TW4EERFDV7XIHDANCNFSM6AAAAAAQUDPNUU . You are receiving this because you are subscribed to this thread.Message ID: @.***>