Closed andrewczgithub closed 4 years ago
Hi Andrew, since you're using an HSMM with Gaussian observations, by default the observation means will be initialized using K-Means, and the covariance matrices will be calculated as the empirical covariance of all the points assigned to a given cluster. This tends to work pretty well in practice, and should give quite similar results to initializing using a Gaussian Mixture Model.
If it's important for your application to initialize using a Gaussian Mixture Model, I can walk you through that. It would involve writing a short routine using Scikit-Learn's Gaussian Mixture class (https://scikit-learn.org/stable/modules/generated/sklearn.mixture.GaussianMixture.html#sklearn.mixture.GaussianMixture).
You can look at the initialization routine for Gaussian observations at line 107 of observations.py
Hi @bantin !
Thank you so much!! what a champion!! I could really use the hlep.
K means is fine to initialize the regimes. I have just trying to get an idea of how to use the framework on an example I breifly understand.
Below is what I have tried but it has come up with an error
import yfinance as yf
data = yf.download("SPY", start="2017-01-01", end="2019-04-30")
data
return_target=data['Close'].pct_change().dropna()
return_target
[*********************100%***********************] 1 of 1 completed
Date
2017-01-04 0.005949
2017-01-05 -0.000794
2017-01-06 0.003578
2017-01-09 -0.003301
2017-01-10 0.000000
...
2019-04-23 0.008992
2019-04-24 -0.002219
2019-04-25 -0.000616
2019-04-26 0.004657
2019-04-29 0.001568
Name: Close, Length: 582, dtype: float64
import autograd.numpy as np
import autograd.numpy.random as npr
from scipy.stats import nbinom
import matplotlib.pyplot as plt
import ssm
from ssm.util import rle, find_permutation
npr.seed(0)
# Set the parameters of the HMM
#T = 5000 # number of time bins
K = 4 # number of discrete states
D = 1 # number of observed dimensions
# # Make an HMM with the true parameters
# true_hsmm = ssm.HSMM(K, D, observations="gaussian")
# true_hsmm.transitions.rs
# z, y = true_hsmm.sample(T)
# z_test, y_test = true_hsmm.sample(T)
# true_ll = true_hsmm.log_probability(y)
# # Fit an HSMM
# N_em_iters = 100
# print("Fitting Gaussian HSMM with EM")
# hsmm = ssm.HSMM(K, D, observations="gaussian")
# hsmm_em_lls = hsmm.fit(y, method="em", num_iters=N_em_iters)
print("Fitting Gaussian HSMM with EM")
hsmm = ssm.HSMM(return_target, D, observations="gaussian")
hsmm_em_lls = hsmm.fit(return_target, method="em", num_iters=N_em_iters)
Fitting Gaussian HSMM with EM
/home/andrewcz/tutorial-env2/lib/python3.7/site-packages/pandas/core/series.py:856: RuntimeWarning: divide by zero encountered in log
result = getattr(ufunc, method)(*inputs, **kwargs)
/home/andrewcz/tutorial-env2/lib/python3.7/site-packages/pandas/core/series.py:856: RuntimeWarning: invalid value encountered in log
result = getattr(ufunc, method)(*inputs, **kwargs)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-4-905b9bd47bfb> in <module>
1 print("Fitting Gaussian HSMM with EM")
----> 2 hsmm = ssm.HSMM(return_target, D, observations="gaussian")
3 hsmm_em_lls = hsmm.fit(return_target, method="em", num_iters=N_em_iters)
~/ssm-master/ssm/hmm.py in __init__(self, K, D, M, init_state_distn, transitions, transition_kwargs, observations, observation_kwargs, **kwargs)
503
504 if init_state_distn is None:
--> 505 init_state_distn = isd.InitialStateDistribution(K, D, M=M)
506 if not isinstance(init_state_distn, isd.InitialStateDistribution):
507 raise TypeError("'init_state_distn' must be a subclass of"
~/ssm-master/ssm/init_state_distns.py in __init__(self, K, D, M)
13 def __init__(self, K, D, M=0):
14 self.K, self.D, self.M = K, D, M
---> 15 self.log_pi0 = -np.log(K) * np.ones(K)
16
17 @property
~/tutorial-env2/lib/python3.7/site-packages/autograd/tracer.py in f_wrapped(*args, **kwargs)
46 return new_box(ans, trace, node)
47 else:
---> 48 return f_raw(*args, **kwargs)
49 f_wrapped.fun = f_raw
50 f_wrapped._is_autograd_primitive = True
~/tutorial-env2/lib/python3.7/site-packages/numpy/core/numeric.py in ones(shape, dtype, order)
205
206 """
--> 207 a = empty(shape, dtype, order)
208 multiarray.copyto(a, 1, casting='unsafe')
209 return a
ValueError: maximum supported dimension for an ndarray is 32, found 582
Hi @andrewczgithub . I'd guess that your data matrix has the dimensions mixed up. return_target
should be a numpy array of size [Time_steps x Num_observations]. So if you are looking at a time series with 100 stocks over 300 days, your data array would be 300 x 100.
SSM also supports passing in a list of multiple datas
corresponding to multiple observations of your time series. In that case, you would pass a list where each element is time_steps x num_observations
Hi @bantin !!
Hope your well! thank you for your help
I tried the below -
import yfinance as yf
data = yf.download("SPY", start="2017-01-01", end="2019-04-30")
data
return_target=data['Close'].pct_change().dropna()
#turn pandas dataframe.
return_target_numpy= return_target.values
import autograd.numpy as np
import autograd.numpy.random as npr
from scipy.stats import nbinom
import matplotlib.pyplot as plt
import ssm
from ssm.util import rle, find_permutation
npr.seed(0)
# Set the parameters of the HMM
#T = 5000 # number of time bins
K = 4 # number of discrete states
D = 1 # number of observed dimensions
print("Fitting Gaussian HSMM with EM")
hsmm = ssm.HSMM(return_target_numpy, D, observations="gaussian")
hsmm_em_lls = hsmm.fit(return_target_numpy, method="em", num_iters=N_em_iters)
but I am still getting the error -
[*********************100%***********************] 1 of 1 completed
Fitting Gaussian HSMM with EM
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-7-aefe39955af7> in <module>
22
23 print("Fitting Gaussian HSMM with EM")
---> 24 hsmm = ssm.HSMM(return_target_numpy, D, observations="gaussian")
25 hsmm_em_lls = hsmm.fit(return_target_numpy, method="em", num_iters=N_em_iters)
~/ssm/ssm/hmm.py in __init__(self, K, D, M, init_state_distn, transitions, transition_kwargs, observations, observation_kwargs, **kwargs)
503
504 if init_state_distn is None:
--> 505 init_state_distn = isd.InitialStateDistribution(K, D, M=M)
506 if not isinstance(init_state_distn, isd.InitialStateDistribution):
507 raise TypeError("'init_state_distn' must be a subclass of"
~/ssm/ssm/init_state_distns.py in __init__(self, K, D, M)
13 def __init__(self, K, D, M=0):
14 self.K, self.D, self.M = K, D, M
---> 15 self.log_pi0 = -np.log(K) * np.ones(K)
16
17 @property
~/tutorial-env/lib/python3.8/site-packages/autograd/tracer.py in f_wrapped(*args, **kwargs)
46 return new_box(ans, trace, node)
47 else:
---> 48 return f_raw(*args, **kwargs)
49 f_wrapped.fun = f_raw
50 f_wrapped._is_autograd_primitive = True
~/tutorial-env/lib/python3.8/site-packages/numpy/core/numeric.py in ones(shape, dtype, order)
205
206 """
--> 207 a = empty(shape, dtype, order)
208 multiarray.copyto(a, 1, casting='unsafe')
209 return a
ValueError: maximum supported dimension for an ndarray is 32, found 582
[*********************100%***********************] 1 of 1 completed
Fitting Gaussian HSMM with EM
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-7-aefe39955af7> in <module>
22
23 print("Fitting Gaussian HSMM with EM")
---> 24 hsmm = ssm.HSMM(return_target_numpy, D, observations="gaussian")
25 hsmm_em_lls = hsmm.fit(return_target_numpy, method="em", num_iters=N_em_iters)
~/ssm/ssm/hmm.py in __init__(self, K, D, M, init_state_distn, transitions, transition_kwargs, observations, observation_kwargs, **kwargs)
503
504 if init_state_distn is None:
--> 505 init_state_distn = isd.InitialStateDistribution(K, D, M=M)
506 if not isinstance(init_state_distn, isd.InitialStateDistribution):
507 raise TypeError("'init_state_distn' must be a subclass of"
~/ssm/ssm/init_state_distns.py in __init__(self, K, D, M)
13 def __init__(self, K, D, M=0):
14 self.K, self.D, self.M = K, D, M
---> 15 self.log_pi0 = -np.log(K) * np.ones(K)
16
17 @property
~/tutorial-env/lib/python3.8/site-packages/autograd/tracer.py in f_wrapped(*args, **kwargs)
46 return new_box(ans, trace, node)
47 else:
---> 48 return f_raw(*args, **kwargs)
49 f_wrapped.fun = f_raw
50 f_wrapped._is_autograd_primitive = True
~/tutorial-env/lib/python3.8/site-packages/numpy/core/numeric.py in ones(shape, dtype, order)
205
206 """
--> 207 a = empty(shape, dtype, order)
208 multiarray.copyto(a, 1, casting='unsafe')
209 return a
ValueError: maximum supported dimension for an ndarray is 32, found 582
i am just not sure what I am doing wrong.
I am trying to fit just a single variable model, to test for regimes.
I was wondering if you could assist.
Kind regards, Andrew
Hi Andrew,
It looks like you are having an issue when you create the HSMM object. The syntax for creating an HSMM object is:
hsmm = ssm.HSMM(K, D, observations="gaussian")
Also, you need to pass in a data matrix with two dimensions. The shape of your data is (583,), and it needs to be (583,1). You can usereturn_target_numpy.reshape(-1,1)
to fix this. After that things should work.
You can take a look through the examples in the notebooks folder for some more examples of how to use the library. I changed a few lines in your example, so please give this a try:
import yfinance as yf
data = yf.download("SPY", start="2017-01-01", end="2019-04-30")
data
return_target=data['Close'].pct_change().dropna()
#turn pandas dataframe.
return_target_numpy= (return_target.values).reshape(-1,1)
import autograd.numpy as np
import autograd.numpy.random as npr
from scipy.stats import nbinom
import matplotlib.pyplot as plt
import ssm
from ssm.util import rle, find_permutation
npr.seed(0)
# Set the parameters of the HMM
#T = 5000 # number of time bins
K = 4 # number of discrete states
D = 1 # number of observed dimensions
num_iters=100
print("Fitting Gaussian HSMM with EM")
hsmm = ssm.HSMM(K, D, observations="gaussian")
hsmm_em_lls = hsmm.fit(return_target_numpy, method="em", num_iters=num_iters)
@andrewczgithub I hope that helps! I'm going to close this issue for now, but please feel free to send me an email if you need more help. My email is bantin@stanford.edu
Hi @slinderman !
Really love the library. I would like to try and use to calculate different regimes.
The below is what I have tried so far -
I have read that I can use a gaussian mixture model to initizlise the model.
I was wondering if you could help with the above.
Kind regards, Andrew