odd predictions possibly caused by proliferation of hidden states?

yarden commented 8 years ago

I've been getting somewhat odd results from pyhsmm's predict. Not sure if it's a feature of the HDP-HMM (or my hyperparameters) or if it's a software issue. I ran pyhsmm on discrete (binary) outputs and fed it strongly periodic data, i.e.: 0,1,0,1,0,1.... and asked it to predict next 10 outputs.

In a 2-hidden state (parametric) Bayesian HMM, there would be two hypotheses: (1) the data were produced by a single hidden state that emits 1 with probability 0.5, or (2) the data were produced by a periodic transitioning between two hidden states, one which emits 1 with probability ~1.0, and another that emits 0 with probability ~1.0. If you had a sparse prior on the emission matrix, but less sparse prior on the transition matrix, you'd favor hypothesis 2 and your posterior would predict periodic observations.

pyhsmm doesn't predict the periodic trend even with a larger set of observations, as far as I can tell. It also uses the max number of states (~25) which seems too large for this data. Is because of my hyperparameter settings? Is it possible to get better inferences by forcing sparsity on the observation matrix and using more concentrated priors for the global HDP?

I attach plots of predicted posteriors from two independent runs (same settings). My code is below. Thanks!

import pyhsmm
import pyhsmm.basic.distributions as distributions
import pyhsmm.util.text
from pyhsmm.util.text import progprint_xrange 

Nmax = 25

# observations
obs_hypparams = {"K": 2,
                 "alpha_0": 1.}
# gamma prior on duration 
dur_hypparams = {'alpha_0': 1.,
                 'beta_0': 1.}
obs_distns = [distributions.Categorical(**obs_hypparams) for state in range(Nmax)]
dur_distns = [distributions.PoissonDuration(**dur_hypparams) for state in range(Nmax)]

def run_model(data):
    posteriormodel = pyhsmm.models.WeakLimitHDPHSMM(
            alpha=6.,gamma=6.,
            init_state_concentration=6.,
            obs_distns=obs_distns,
            dur_distns=dur_distns)
    posteriormodel.add_data(small_data)
    models = []
    for idx in progprint_xrange(200):
        posteriormodel.resample_model()
        if (idx+1) % 10 == 0:
            models.append(copy.deepcopy(posteriormodel))
    return models

# strongly periodic data, except for first four observations
small_data = np.array([0, 0]*2 + [0, 1]*10)
big_data = np.array([0, 0]*2 + [0, 1]*50)

# inference
small_models = run_model(small_data)
big_models = run_model(big_data)

def summarize_pred(models, num_preds=10):
    all_preds = []
    max_state = 1
    for m in models:
        prediction, hidden_states = m.predict(np.array([]), num_preds)
        max_state = max(max_state, hidden_states.max())
        all_preds.append(prediction)
    all_preds = np.array(all_preds)
    mean_preds = all_preds.mean(axis=0)
    return mean_preds, max_state

print "prediction posterior for small data: "
small_summary, small_max_state = summarize_pred(small_models)
print small_summary
print " - max # states: %d" %(small_max_state)
print "prediction posterior for big data: "
big_summary, big_max_state = summarize_pred(big_models)
print big_summary
print " - max # states: %d" %(big_max_state)

pyhsmm_periodic pyhsmm_periodic2

mattjj commented 8 years ago

Some quick thoughts:

There's a bug in your run_data function, since it calls add_data(small_data) rather than add_data(data). The trouble with globals!

Using Poisson durations is probably not a good idea here. While it could learn a Poisson duration distribution that looks almost deterministically 1 (given enough data), the prior settings really discourage that.

In a similar vein, if you want to use the HDP for this kind of model selection (choosing the number of states), the concentration parameters really matter. In particular, think of alpha and gamma as something like the expected number of states, or more precisely think of gamma as controlling the uniformity of the average transition matrix row and alpha as controlling the sparsity of each row. More generally, keep in mind that even with the right concentration hyperparameters effectively regularizing things (or, even better, resampling those concentration hyperparameterts), the HDP will only "select the right number of states" in the sense that some states are overwhelmingly more popular; there will still be a few states used only a time or two, and in fact for that reason the DP is not consistent for model selection in the traditional sense. In any case, pyhsmm doesn't label the state numbers consecutively and so the max state index doesn't mean anything; if you want to count the number of used states, you want to do something like len(np.unique(stateseq)) or look at np.bincount(stateseq).

Here's some updated code which makes the HDP-HMM results look more sensible on your toy problem. I'm sure we could tweak the parameters so that the output is perfect, but I just set things to be a bit more sensible.

I'll leave this issue open in case you have more questions on this, or in case I missed some things in this response.

import numpy as np
import copy

import pyhsmm
import pyhsmm.basic.distributions as distributions
import pyhsmm.util.text
from pyhsmm.util.text import progprint_xrange 

Nmax = 25

# observations
obs_hypparams = {"K": 2,
                 "alpha_0": 0.1}
obs_distns = [distributions.Categorical(**obs_hypparams) for state in range(Nmax)]

def run_model(data):
    posteriormodel = pyhsmm.models.WeakLimitHDPHMM(
            alpha=0.1, gamma=2., init_state_concentration=1.,
            obs_distns=obs_distns)
    posteriormodel.add_data(data)
    for _ in progprint_xrange(500):
        posteriormodel.resample_model()
    return posteriormodel

# strongly periodic data, except for first four observations
small_data = np.array([0, 0]*2 + [0, 1]*10)
big_data = np.array([0, 0]*2 + [0, 1]*50)

# inference
small_model = run_model(small_data)
big_model = run_model(big_data)

def summarize_pred(model, num_preds=10):
    all_preds = []
    for _ in xrange(10):
        prediction, hidden_states = model.predict(np.array([]), num_preds)
        num_states = len(np.unique(hidden_states))
        print "prediction: ", prediction
        print "prediction: ", len(prediction)
        all_preds.append(prediction)
    all_preds = np.array(all_preds)
    mean_preds = all_preds.mean(axis=0)
    print "mean preds: ", mean_preds, len(mean_preds)
    return mean_preds, num_states

print "prediction posterior for small data: "
small_summary, small_max_state = summarize_pred(small_model)
print small_summary
print " - # states: %d" %(small_max_state)
print "prediction posterior for big data: "
big_summary, big_max_state = summarize_pred(big_model)
print big_summary
print " - # states: %d" %(big_max_state)

mattjj commented 8 years ago

Btw, similar settings work with the HSMM classes, but one of the sampler steps doesn't work so well with very small values of alpha (like less than 1). Sometimes updates can take a long time. I need a different sampler for that case. Anyway, it seems to work like this:

# observations
obs_distns = [distributions.Categorical(K=2, alpha_0=0.1) for state in range(Nmax)]

dur_distns = [distributions.PoissonDuration(alpha_0=1., beta_0=1.) for state in range(Nmax)]

def run_model(data):
    posteriormodel = pyhsmm.models.WeakLimitHDPHSMM(
            alpha=1., gamma=5., init_state_concentration=1.,
            obs_distns=obs_distns, dur_distns=dur_distns)
    posteriormodel.add_data(data, trunc=10)
    for _ in progprint_xrange(500):
        posteriormodel.resample_model()
    return posteriormodel

yarden commented 8 years ago

Thanks a lot! The parameter explanation makes sense. With your revised model (non-Poisson durations), I still get variable results. On some runs, I get reasonable results on small data but not on big:

small:
[ 1.   0.8  0.9  0.1  0.9  0.1  0.9  0.1  0.9  0.1]
 - # states: 4
big:
[ 0.   0.   0.6  0.4  0.6  0.4  0.6  0.4  0.6  0.4]
 - # states: 3

on many other runs I get very weird results for both, e.g.:

small: 
[ 0.4  0.3  0.   0.   0.   0.   0.   0.   0.   0. ]
 - # states: 3
big:
[ 0.   0.   0.1  0.1  0.4  0.3  0.6  0.3  0.6  0.3]
 - # states: 5

here's another run:

small:
[ 0.   0.8  0.8  0.7  0.7  0.7  0.5  0.7  0.4  0.4]
 - # states: 2
big:
[ 0.2  0.   0.5  0.1  0.5  0.1  0.5  0.1  0.5  0.1]
 - # states: 2

have you run into this? Increasing number of samples (e.g. from 500 to 5000) doesn't stabilize it. Thanks!

mattjj commented 8 years ago

I think those first results on big data are probably good. The phase is random (esp in the HMM) because of the prefix, so getting samples that alternate but average out to something more uniform is not surprising. Maybe look at the samples instead of the mean.

As for the zeros, it could just be bad model parameter samples. Are you averaging over many models? To get the best point estimates you should probably use mean field over Gibbs.

yarden commented 8 years ago

Thanks, I'll keep tweaking. Final question on this: can you explain what you mean by "mean field over Gibbs" to get point estimates? I was simply taking posterior mean over many samples.

mattjj commented 8 years ago

I meant that if you want a single point estimate of your model parameters, a good way to get one is to run mean field as your inference algorithm and use the mean of the variational approximation. Taking many samples (both of the model parameters and also of the states and predicted observations) is More Bayesian™ but ensuring you have enough good samples is harder than checking your mean field algorithm converged to something decent.

tiddlerhxy commented 5 years ago

Hello! I changed the data to random, I don't know what the predicted results mean, why 0 or 1? The code is as follows:

`import numpy as np import pyhsmm import pyhsmm.basic.distributions as distributions import pyhsmm.util.text import random

Nmax = 25

obs_hypparams = {"K": 2, "alpha_0": 1} dur_hypparams = {'alpha_0': 1., 'beta_0': 1.} obs_distns = [distributions.Categorical(obs_hypparams) for state in range(Nmax)] dur_distns = [distributions.PoissonDuration(dur_hypparams) for state in range(Nmax)]

def run_model(data): posteriormodel = pyhsmm.models.WeakLimitHDPHSMMPossibleChangepoints( alpha=6, gamma=6., init_state_concentration=6., obs_distns=obs_distns, dur_distns=dur_distns ) posteriormodel.add_data(data) return posteriormodel

small_data = np.array([random.randint(0,10) for i in range(24)]) print("small_data:",small_data)

small_model = run_model(small_data)

prediction, hidden_states = small_model.predict(np.array([]), 10) print("prediction:",prediction)`

result: small_data: [5 2 3 5 5 2 1 3 5 6 3 0 7 0 3 3 2 8 6 4 7 9 0 2] prediction: [0. 0. 0. 0. 0. 1. 0. 0. 0. 1.]

mattjj / pyhsmm

odd predictions possibly caused by proliferation of hidden states? #60