lindermanlab / ssm

Bayesian learning and inference for state space models
MIT License
559 stars 197 forks source link

Regime Change Calculation - An example #85

Closed andrewczgithub closed 4 years ago

andrewczgithub commented 4 years ago

Hi @slinderman !

Really love the library. I would like to try and use to calculate different regimes.

The below is what I have tried so far -

import yfinance as yf
data = yf.download("SPY", start="2017-01-01", end="2019-04-30")
data
return_target=data['Close'].pct_change().dropna()
return_target

import autograd.numpy as np
import autograd.numpy.random as npr
from scipy.stats import nbinom
import matplotlib.pyplot as plt
import ssm
from ssm.util import rle, find_permutation

npr.seed(0)

# Set the parameters of the HMM
#T = 5000    # number of time bins
K = 4       # number of discrete states
D = 1       # number of observed dimensions

print("Fitting Gaussian HSMM with EM")
hsmm = ssm.HSMM(return_target, D, observations="gaussian")
hsmm_em_lls = hsmm.fit(return_target, method="em", num_iters=N_em_iters)

I have read that I can use a gaussian mixture model to initizlise the model.

I was wondering if you could help with the above.

Kind regards, Andrew

bantin commented 4 years ago

Hi Andrew, since you're using an HSMM with Gaussian observations, by default the observation means will be initialized using K-Means, and the covariance matrices will be calculated as the empirical covariance of all the points assigned to a given cluster. This tends to work pretty well in practice, and should give quite similar results to initializing using a Gaussian Mixture Model.

If it's important for your application to initialize using a Gaussian Mixture Model, I can walk you through that. It would involve writing a short routine using Scikit-Learn's Gaussian Mixture class (https://scikit-learn.org/stable/modules/generated/sklearn.mixture.GaussianMixture.html#sklearn.mixture.GaussianMixture).

You can look at the initialization routine for Gaussian observations at line 107 of observations.py

andrewczgithub commented 4 years ago

Hi @bantin !

Thank you so much!! what a champion!! I could really use the hlep.

K means is fine to initialize the regimes. I have just trying to get an idea of how to use the framework on an example I breifly understand.

Below is what I have tried but it has come up with an error


import yfinance as yf
data = yf.download("SPY", start="2017-01-01", end="2019-04-30")
data
return_target=data['Close'].pct_change().dropna()
return_target
[*********************100%***********************]  1 of 1 completed
Date
2017-01-04    0.005949
2017-01-05   -0.000794
2017-01-06    0.003578
2017-01-09   -0.003301
2017-01-10    0.000000
                ...   
2019-04-23    0.008992
2019-04-24   -0.002219
2019-04-25   -0.000616
2019-04-26    0.004657
2019-04-29    0.001568
Name: Close, Length: 582, dtype: float64
import autograd.numpy as np
import autograd.numpy.random as npr
from scipy.stats import nbinom
import matplotlib.pyplot as plt
import ssm
from ssm.util import rle, find_permutation
​
npr.seed(0)
​
# Set the parameters of the HMM
#T = 5000    # number of time bins
K = 4       # number of discrete states
D = 1       # number of observed dimensions
​
# # Make an HMM with the true parameters
# true_hsmm = ssm.HSMM(K, D, observations="gaussian")
# true_hsmm.transitions.rs
# z, y = true_hsmm.sample(T)
# z_test, y_test = true_hsmm.sample(T)
# true_ll = true_hsmm.log_probability(y)
​
# # Fit an HSMM
# N_em_iters = 100
​
# print("Fitting Gaussian HSMM with EM")
# hsmm = ssm.HSMM(K, D, observations="gaussian")
# hsmm_em_lls = hsmm.fit(y, method="em", num_iters=N_em_iters)
​
print("Fitting Gaussian HSMM with EM")
hsmm = ssm.HSMM(return_target, D, observations="gaussian")
hsmm_em_lls = hsmm.fit(return_target, method="em", num_iters=N_em_iters)
Fitting Gaussian HSMM with EM
/home/andrewcz/tutorial-env2/lib/python3.7/site-packages/pandas/core/series.py:856: RuntimeWarning: divide by zero encountered in log
  result = getattr(ufunc, method)(*inputs, **kwargs)
/home/andrewcz/tutorial-env2/lib/python3.7/site-packages/pandas/core/series.py:856: RuntimeWarning: invalid value encountered in log
  result = getattr(ufunc, method)(*inputs, **kwargs)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-4-905b9bd47bfb> in <module>
      1 print("Fitting Gaussian HSMM with EM")
----> 2 hsmm = ssm.HSMM(return_target, D, observations="gaussian")
      3 hsmm_em_lls = hsmm.fit(return_target, method="em", num_iters=N_em_iters)

~/ssm-master/ssm/hmm.py in __init__(self, K, D, M, init_state_distn, transitions, transition_kwargs, observations, observation_kwargs, **kwargs)
    503 
    504         if init_state_distn is None:
--> 505             init_state_distn = isd.InitialStateDistribution(K, D, M=M)
    506         if not isinstance(init_state_distn, isd.InitialStateDistribution):
    507             raise TypeError("'init_state_distn' must be a subclass of"

~/ssm-master/ssm/init_state_distns.py in __init__(self, K, D, M)
     13     def __init__(self, K, D, M=0):
     14         self.K, self.D, self.M = K, D, M
---> 15         self.log_pi0 = -np.log(K) * np.ones(K)
     16 
     17     @property

~/tutorial-env2/lib/python3.7/site-packages/autograd/tracer.py in f_wrapped(*args, **kwargs)
     46             return new_box(ans, trace, node)
     47         else:
---> 48             return f_raw(*args, **kwargs)
     49     f_wrapped.fun = f_raw
     50     f_wrapped._is_autograd_primitive = True

~/tutorial-env2/lib/python3.7/site-packages/numpy/core/numeric.py in ones(shape, dtype, order)
    205 
    206     """
--> 207     a = empty(shape, dtype, order)
    208     multiarray.copyto(a, 1, casting='unsafe')
    209     return a

ValueError: maximum supported dimension for an ndarray is 32, found 582
bantin commented 4 years ago

Hi @andrewczgithub . I'd guess that your data matrix has the dimensions mixed up. return_target should be a numpy array of size [Time_steps x Num_observations]. So if you are looking at a time series with 100 stocks over 300 days, your data array would be 300 x 100.

SSM also supports passing in a list of multiple datas corresponding to multiple observations of your time series. In that case, you would pass a list where each element is time_steps x num_observations

andrewczgithub commented 4 years ago

Hi @bantin !!

Hope your well! thank you for your help

I tried the below -

import yfinance as yf
data = yf.download("SPY", start="2017-01-01", end="2019-04-30")
data
return_target=data['Close'].pct_change().dropna()

#turn pandas dataframe.
return_target_numpy= return_target.values

import autograd.numpy as np
import autograd.numpy.random as npr
from scipy.stats import nbinom
import matplotlib.pyplot as plt
import ssm
from ssm.util import rle, find_permutation

npr.seed(0)

# Set the parameters of the HMM
#T = 5000    # number of time bins
K = 4       # number of discrete states
D = 1       # number of observed dimensions

print("Fitting Gaussian HSMM with EM")
hsmm = ssm.HSMM(return_target_numpy, D, observations="gaussian")
hsmm_em_lls = hsmm.fit(return_target_numpy, method="em", num_iters=N_em_iters)
andrewczgithub commented 4 years ago

but I am still getting the error -


[*********************100%***********************]  1 of 1 completed
Fitting Gaussian HSMM with EM
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-7-aefe39955af7> in <module>
     22 
     23 print("Fitting Gaussian HSMM with EM")
---> 24 hsmm = ssm.HSMM(return_target_numpy, D, observations="gaussian")
     25 hsmm_em_lls = hsmm.fit(return_target_numpy, method="em", num_iters=N_em_iters)

~/ssm/ssm/hmm.py in __init__(self, K, D, M, init_state_distn, transitions, transition_kwargs, observations, observation_kwargs, **kwargs)
    503 
    504         if init_state_distn is None:
--> 505             init_state_distn = isd.InitialStateDistribution(K, D, M=M)
    506         if not isinstance(init_state_distn, isd.InitialStateDistribution):
    507             raise TypeError("'init_state_distn' must be a subclass of"

~/ssm/ssm/init_state_distns.py in __init__(self, K, D, M)
     13     def __init__(self, K, D, M=0):
     14         self.K, self.D, self.M = K, D, M
---> 15         self.log_pi0 = -np.log(K) * np.ones(K)
     16 
     17     @property

~/tutorial-env/lib/python3.8/site-packages/autograd/tracer.py in f_wrapped(*args, **kwargs)
     46             return new_box(ans, trace, node)
     47         else:
---> 48             return f_raw(*args, **kwargs)
     49     f_wrapped.fun = f_raw
     50     f_wrapped._is_autograd_primitive = True

~/tutorial-env/lib/python3.8/site-packages/numpy/core/numeric.py in ones(shape, dtype, order)
    205 
    206     """
--> 207     a = empty(shape, dtype, order)
    208     multiarray.copyto(a, 1, casting='unsafe')
    209     return a

ValueError: maximum supported dimension for an ndarray is 32, found 582

[*********************100%***********************]  1 of 1 completed
Fitting Gaussian HSMM with EM
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-7-aefe39955af7> in <module>
     22 
     23 print("Fitting Gaussian HSMM with EM")
---> 24 hsmm = ssm.HSMM(return_target_numpy, D, observations="gaussian")
     25 hsmm_em_lls = hsmm.fit(return_target_numpy, method="em", num_iters=N_em_iters)

~/ssm/ssm/hmm.py in __init__(self, K, D, M, init_state_distn, transitions, transition_kwargs, observations, observation_kwargs, **kwargs)
    503 
    504         if init_state_distn is None:
--> 505             init_state_distn = isd.InitialStateDistribution(K, D, M=M)
    506         if not isinstance(init_state_distn, isd.InitialStateDistribution):
    507             raise TypeError("'init_state_distn' must be a subclass of"

~/ssm/ssm/init_state_distns.py in __init__(self, K, D, M)
     13     def __init__(self, K, D, M=0):
     14         self.K, self.D, self.M = K, D, M
---> 15         self.log_pi0 = -np.log(K) * np.ones(K)
     16 
     17     @property

~/tutorial-env/lib/python3.8/site-packages/autograd/tracer.py in f_wrapped(*args, **kwargs)
     46             return new_box(ans, trace, node)
     47         else:
---> 48             return f_raw(*args, **kwargs)
     49     f_wrapped.fun = f_raw
     50     f_wrapped._is_autograd_primitive = True

~/tutorial-env/lib/python3.8/site-packages/numpy/core/numeric.py in ones(shape, dtype, order)
    205 
    206     """
--> 207     a = empty(shape, dtype, order)
    208     multiarray.copyto(a, 1, casting='unsafe')
    209     return a

ValueError: maximum supported dimension for an ndarray is 32, found 582
andrewczgithub commented 4 years ago

i am just not sure what I am doing wrong.

I am trying to fit just a single variable model, to test for regimes.

I was wondering if you could assist.

Kind regards, Andrew

bantin commented 4 years ago

Hi Andrew,

It looks like you are having an issue when you create the HSMM object. The syntax for creating an HSMM object is: hsmm = ssm.HSMM(K, D, observations="gaussian")

Also, you need to pass in a data matrix with two dimensions. The shape of your data is (583,), and it needs to be (583,1). You can usereturn_target_numpy.reshape(-1,1) to fix this. After that things should work.

You can take a look through the examples in the notebooks folder for some more examples of how to use the library. I changed a few lines in your example, so please give this a try:


import yfinance as yf
data = yf.download("SPY", start="2017-01-01", end="2019-04-30")
data
return_target=data['Close'].pct_change().dropna()

#turn pandas dataframe.
return_target_numpy= (return_target.values).reshape(-1,1)

import autograd.numpy as np
import autograd.numpy.random as npr
from scipy.stats import nbinom
import matplotlib.pyplot as plt
import ssm
from ssm.util import rle, find_permutation

npr.seed(0)

# Set the parameters of the HMM
#T = 5000    # number of time bins
K = 4       # number of discrete states
D = 1       # number of observed dimensions
num_iters=100

print("Fitting Gaussian HSMM with EM")
hsmm = ssm.HSMM(K, D, observations="gaussian")
hsmm_em_lls = hsmm.fit(return_target_numpy, method="em", num_iters=num_iters)
bantin commented 4 years ago

@andrewczgithub I hope that helps! I'm going to close this issue for now, but please feel free to send me an email if you need more help. My email is bantin@stanford.edu