I just found pyhsmm last week. First a minor issue:
It looks like future is not installed by default in anaconda's distribution of python, so that pip install pyhsmm
fails, unless I first install future manually.
Second, I was looking at the paper "The Infinite Hidden Markov Model" by Beal et. al. The first example that they give is of the string 30*'ABCDEFEDCB', and remark that the most parsimonious HMM which models this has 10 states. I'm assuming that the WeakLimitHDPHMM will essentially run their algorithm (of course I have to give a maximum number of states). I defined the following function:
def trainHDPHMM(data, Nmax, iterations=100, alpha_0=1.0, alpha_size=None):
if alpha_size is None:
alpha_size = 1+max(data)
print("Alphabet size=%d"% alpha_size)
obs_hyperparams = {
'alpha_0': alpha_0,
'K': alpha_size
}
obs_distns = [pyhsmm.distributions.Categorical(**obs_hyperparams) for state in range(Nmax)]
posteriormodel = pyhsmm.models.WeakLimitHDPHMM(
alpha_a_0=1., alpha_b_0=0.25,
gamma_a_0=1., gamma_b_0=0.25,
init_state_concentration=1.,
obs_distns=obs_distns)
posteriormodel.add_data(data)
for idx in progprint_xrange(iterations):
posteriormodel.resample_model()
return posteriormodel
When I call this using
model = trainHDPHMM(np.array(map(lambda _: ord(_) - ord('A'), 30*'ABCDEFEDCB')), 40)
It sometimes (it depends on the random seed) gets a traceback like (even though I've done
np.seterr(divide='ignore'))
File "", line 1, in
File "test_hmm.py", line 24, in trainHDPHMM
posteriormodel.resample_model()
File "/u/victor/.local/lib/python2.7/site-packages/pyhsmm-0.1.6-py2.7-linux-x86_64.egg/pyhsmm/models.py", line 441, in resample_model
self.resample_parameters()
File "/u/victor/.local/lib/python2.7/site-packages/pyhsmm-0.1.6-py2.7-linux-x86_64.egg/pyhsmm/models.py", line 447, in resample_parameters
self.resample_trans_distn()
File "/u/victor/.local/lib/python2.7/site-packages/pyhsmm-0.1.6-py2.7-linux-x86_64.egg/pyhsmm/models.py", line 457, in resample_trans_distn
self.trans_distn.resample([s.stateseq for s in self.states_list])
File "/u/victor/.local/lib/python2.7/site-packages/pyhsmm-0.1.6-py2.7-linux-x86_64.egg/pyhsmm/internals/transitions.py", line 378, in resample
stateseqs=stateseqs,trans_counts=trans_counts)
File "/u/victor/.local/lib/python2.7/site-packages/pyhsmm-0.1.6-py2.7-linux-x86_64.egg/pyhsmm/internals/transitions.py", line 321, in resample
stateseqs=stateseqs,trans_counts=trans_counts)
File "/u/victor/.local/lib/python2.7/site-packages/pyhsmm-0.1.6-py2.7-linux-x86_64.egg/pyhsmm/internals/transitions.py", line 85, in resample
distn.resample(counts)
File "/u/victor/.local/lib/python2.7/site-packages/pybasicbayes/distributions/multinomial.py", line 107, in resample
self.weights = np.random.dirichlet(self.alphav_0 + counts)
File "mtrand.pyx", line 4744, in mtrand.RandomState.dirichlet (numpy/random/mtrand/mtrand.c:38703)
File "mtrand.pyx", line 4751, in mtrand.RandomState.dirichlet (numpy/random/mtrand/mtrand.c:38610)
ZeroDivisionError: float division
I see that this issue was brought up in https://github.com/numpy/numpy/issues/5851 over 2 years ago, but it still doesn't seem to have been fixed (I'm running numpy version 1.31.1). Is there a workaround (or fix) for this?
But now to what I'm really interested in:
In the paper "Particle Gibbs for Infinite Hidden Markov Models" by Tripuraneni et. al. (in NIPS 2015) they describe a method using particle Gibbs to resample the state trajectory which, they say, greatly sped up apparent convergence. On one model that I'm running I find that I'm still improving, very gradually, even after 20,000 sampling iterations, so I'm interested in trying it. So, what changes would I need to make (more importantly, where) to pyshmm and/or pybasicbayes to add using particle gibbs as an option?
I just found pyhsmm last week. First a minor issue:
It looks like future is not installed by default in anaconda's distribution of python, so that pip install pyhsmm fails, unless I first install future manually.
Second, I was looking at the paper "The Infinite Hidden Markov Model" by Beal et. al. The first example that they give is of the string 30*'ABCDEFEDCB', and remark that the most parsimonious HMM which models this has 10 states. I'm assuming that the WeakLimitHDPHMM will essentially run their algorithm (of course I have to give a maximum number of states). I defined the following function:
When I call this using
It sometimes (it depends on the random seed) gets a traceback like (even though I've done np.seterr(divide='ignore'))
File "", line 1, in
File "test_hmm.py", line 24, in trainHDPHMM
posteriormodel.resample_model()
File "/u/victor/.local/lib/python2.7/site-packages/pyhsmm-0.1.6-py2.7-linux-x86_64.egg/pyhsmm/models.py", line 441, in resample_model
self.resample_parameters()
File "/u/victor/.local/lib/python2.7/site-packages/pyhsmm-0.1.6-py2.7-linux-x86_64.egg/pyhsmm/models.py", line 447, in resample_parameters
self.resample_trans_distn()
File "/u/victor/.local/lib/python2.7/site-packages/pyhsmm-0.1.6-py2.7-linux-x86_64.egg/pyhsmm/models.py", line 457, in resample_trans_distn
self.trans_distn.resample([s.stateseq for s in self.states_list])
File "/u/victor/.local/lib/python2.7/site-packages/pyhsmm-0.1.6-py2.7-linux-x86_64.egg/pyhsmm/internals/transitions.py", line 378, in resample
stateseqs=stateseqs,trans_counts=trans_counts)
File "/u/victor/.local/lib/python2.7/site-packages/pyhsmm-0.1.6-py2.7-linux-x86_64.egg/pyhsmm/internals/transitions.py", line 321, in resample
stateseqs=stateseqs,trans_counts=trans_counts)
File "/u/victor/.local/lib/python2.7/site-packages/pyhsmm-0.1.6-py2.7-linux-x86_64.egg/pyhsmm/internals/transitions.py", line 85, in resample
distn.resample(counts)
File "/u/victor/.local/lib/python2.7/site-packages/pybasicbayes/distributions/multinomial.py", line 107, in resample
self.weights = np.random.dirichlet(self.alphav_0 + counts)
File "mtrand.pyx", line 4744, in mtrand.RandomState.dirichlet (numpy/random/mtrand/mtrand.c:38703)
File "mtrand.pyx", line 4751, in mtrand.RandomState.dirichlet (numpy/random/mtrand/mtrand.c:38610)
ZeroDivisionError: float division
I see that this issue was brought up in https://github.com/numpy/numpy/issues/5851 over 2 years ago, but it still doesn't seem to have been fixed (I'm running numpy version 1.31.1). Is there a workaround (or fix) for this?
But now to what I'm really interested in:
In the paper "Particle Gibbs for Infinite Hidden Markov Models" by Tripuraneni et. al. (in NIPS 2015) they describe a method using particle Gibbs to resample the state trajectory which, they say, greatly sped up apparent convergence. On one model that I'm running I find that I'm still improving, very gradually, even after 20,000 sampling iterations, so I'm interested in trying it. So, what changes would I need to make (more importantly, where) to pyshmm and/or pybasicbayes to add using particle gibbs as an option?