mattjj / pyhsmm

MIT License
550 stars 175 forks source link

Using particle Gibbs #81

Open algebravic opened 7 years ago

algebravic commented 7 years ago

I just found pyhsmm last week. First a minor issue:

It looks like future is not installed by default in anaconda's distribution of python, so that pip install pyhsmm fails, unless I first install future manually.

Second, I was looking at the paper "The Infinite Hidden Markov Model" by Beal et. al. The first example that they give is of the string 30*'ABCDEFEDCB', and remark that the most parsimonious HMM which models this has 10 states. I'm assuming that the WeakLimitHDPHMM will essentially run their algorithm (of course I have to give a maximum number of states). I defined the following function:

def trainHDPHMM(data, Nmax, iterations=100, alpha_0=1.0, alpha_size=None):
    if alpha_size is None:
        alpha_size = 1+max(data)
    print("Alphabet size=%d"% alpha_size)
    obs_hyperparams = {
        'alpha_0': alpha_0,
        'K': alpha_size
        }
    obs_distns = [pyhsmm.distributions.Categorical(**obs_hyperparams) for state in range(Nmax)]
    posteriormodel = pyhsmm.models.WeakLimitHDPHMM(
        alpha_a_0=1., alpha_b_0=0.25,
        gamma_a_0=1., gamma_b_0=0.25,
        init_state_concentration=1.,
        obs_distns=obs_distns)
    posteriormodel.add_data(data)
    for idx in progprint_xrange(iterations):
        posteriormodel.resample_model()
    return posteriormodel

When I call this using

model = trainHDPHMM(np.array(map(lambda _: ord(_) - ord('A'), 30*'ABCDEFEDCB')), 40)

It sometimes (it depends on the random seed) gets a traceback like (even though I've done np.seterr(divide='ignore'))

File "", line 1, in File "test_hmm.py", line 24, in trainHDPHMM posteriormodel.resample_model() File "/u/victor/.local/lib/python2.7/site-packages/pyhsmm-0.1.6-py2.7-linux-x86_64.egg/pyhsmm/models.py", line 441, in resample_model self.resample_parameters() File "/u/victor/.local/lib/python2.7/site-packages/pyhsmm-0.1.6-py2.7-linux-x86_64.egg/pyhsmm/models.py", line 447, in resample_parameters self.resample_trans_distn() File "/u/victor/.local/lib/python2.7/site-packages/pyhsmm-0.1.6-py2.7-linux-x86_64.egg/pyhsmm/models.py", line 457, in resample_trans_distn self.trans_distn.resample([s.stateseq for s in self.states_list]) File "/u/victor/.local/lib/python2.7/site-packages/pyhsmm-0.1.6-py2.7-linux-x86_64.egg/pyhsmm/internals/transitions.py", line 378, in resample stateseqs=stateseqs,trans_counts=trans_counts) File "/u/victor/.local/lib/python2.7/site-packages/pyhsmm-0.1.6-py2.7-linux-x86_64.egg/pyhsmm/internals/transitions.py", line 321, in resample stateseqs=stateseqs,trans_counts=trans_counts) File "/u/victor/.local/lib/python2.7/site-packages/pyhsmm-0.1.6-py2.7-linux-x86_64.egg/pyhsmm/internals/transitions.py", line 85, in resample distn.resample(counts) File "/u/victor/.local/lib/python2.7/site-packages/pybasicbayes/distributions/multinomial.py", line 107, in resample self.weights = np.random.dirichlet(self.alphav_0 + counts) File "mtrand.pyx", line 4744, in mtrand.RandomState.dirichlet (numpy/random/mtrand/mtrand.c:38703) File "mtrand.pyx", line 4751, in mtrand.RandomState.dirichlet (numpy/random/mtrand/mtrand.c:38610) ZeroDivisionError: float division

I see that this issue was brought up in https://github.com/numpy/numpy/issues/5851 over 2 years ago, but it still doesn't seem to have been fixed (I'm running numpy version 1.31.1). Is there a workaround (or fix) for this?

But now to what I'm really interested in:

In the paper "Particle Gibbs for Infinite Hidden Markov Models" by Tripuraneni et. al. (in NIPS 2015) they describe a method using particle Gibbs to resample the state trajectory which, they say, greatly sped up apparent convergence. On one model that I'm running I find that I'm still improving, very gradually, even after 20,000 sampling iterations, so I'm interested in trying it. So, what changes would I need to make (more importantly, where) to pyshmm and/or pybasicbayes to add using particle gibbs as an option?