mattjj / pyhsmm

MIT License
547 stars 173 forks source link

problem running examples #3

Closed inti closed 11 years ago

inti commented 11 years ago

Hi, I have the following question. Could you give me on how to use your code to identify change point on a one dimensional sequence of poisson counts. the counts are large (e.g, > 100) (and show overdispersion) and I would like to identify change points without specifying the number of different states the segments can adopt. If possible I would like to try the HDP-HMM and the HDP-HSMM and compare the results.

I have looked at your change point example and I found the line "TODO estimate the changepoints instead", so I decided to write to get some advice on how to use your code. Attached is a example of how the actual data looks like. deviation from the central line of points are potential segments on which we want to identify the change points.

Look forward to your comments.

Inti Pedroso

Screenshot-1

mattjj commented 11 years ago

Ah, you probably aren't interested in the possiblechangepoints.py demo file. The models like HSMMPossibleChangepoints are for when you have a changepoint detector on the side (outside of pyhsmm) that you want to use to speed up the HSMM inference.

If I understand your task correctly, you can probably follow the example in basic.py pretty closely. As you run the resampling loop (at the bottom of that file) you'll want to save the samples every few iterations; you can either copy the model object using copy.deepcopy or you can just pull out the part you care about, which is probably the state sequences. To get the state sequence arrays, you can do something like

saved_stateseqs.append(posteriormodel.states_list[0].stateseq.copy())

Those state sequences will be arrays of integers, where the integer at index i labels the state at time i. So the sampled changepoints correspond to wherever there are changes in that index array.

How long are your sequences? The HSMM inference in pyhsmm won't work well for very long sequences unless you can specify a maximum duration truncation level ("trunc"). HMM inference will scale much better in that regard (it'll go like O(T) instead of O(T^2) where T is the length of the sequence), though you should probably try to use the HMMStatesEigen object instead of the HMMStatesPython to speed things up. You'll also probably need a Sticky HDP-HMM. All the code is there but I'm not sure if it's uniformly easy to access without delving into the code a bit.

Let me know if you'd like more advice on that, or if I misunderstood your question!

mattjj commented 11 years ago

Oh and I'd be happy to take a look at an example data file if you want some guidance on getting stuff running.