mattjj / pyhsmm

MIT License
547 stars 173 forks source link

Use of prior information for segmentation #10

Closed inti closed 11 years ago

inti commented 11 years ago

Hi Matt, Hope everything is going well with you. I was wondering what would be a effective way of including prior information for a change-point/segmentation using the Sticky-HMM. In particular I have a set of sequences of which there is a previously generated high quality segmentation. I would like to use that information in order to speed up and help segmenting new data. What I did think it may be possible it was:

Does this sounds like something good? or perhaps it would be better to remove the pre-segmented data prior to loading and analyzing the new data. I imagine if I keep the pre-segmented data in the model will be pull towards that answer (to some extent) but if I remove it then i would be only having smart start for the distribution and transition parameters.

What do you think?

BW, Inti

mattjj commented 11 years ago

That sounds like a very good idea. I think there are some things you can do with the code right now, and I'd love to hear if you need some more capability. (Also, let me know if anything doesn't work as it should!)

One thing you can do is pass a stateseq argument to add_data, which will initialize the state sequence with that argument. Something like this, where my_stateseq_array is a 1-dimensional, length-T array of integers in the range {0,1,...,number_of_states-1}:

hsmm_model.add_data(data,stateseq=my_stateseq_array)

To add multiple data sequences to a model, you can call add_data multiple times, and maybe only for your special pre-segmented data should you pass that stateseq initialization argument.

As you said, you also may want to avoid resampling the state sequence for that special data, so here's some surgery you can perform. Assuming you added the special sequence to the model first, you can do

hsmm_model.states_list[0].resample = lambda self: None

That will replace the special sequence's states object's resample method, which gets called in hsmm_model.resample_model, with a no-op! Of course, if you want some other control mechanism, feel free to add that to the code as well.

I'm going to close the issue because I think that covers everything you brought up, but if there are any problems or if I missed things, feel free to re-open!