Closed Wusir2018 closed 4 years ago
@poypoyan
Thank you for your reply!
It means that Duration (D)
in HSMM is a probability distribution, so the smaller segment
can transit to next state.
Poisson distribution is used to denote the duration distribution.
I have a question about how to chose the parameters of Poisson distribution ($\lambda$) ?
Best wishes!
It means that
Duration (D)
in HSMM is a probability distribution, so thesmaller segment
can transit to next state.
Yeah, there is a duration distribution for each state in (explicit-duration) hsmm. Please take note that there are many other variants of hsmm, and probably those have more (or different) parameters than explicit-duration hsmm.
Poisson distribution is used to denote the duration distribution. I have a question about how to chose the parameters of Poisson distribution ($\lambda$) ?
As of now, my code doesn't support Poisson distribution as duration distribution (yet).
For now, the duration distribution for each state j
is stored as the rows of the 2D array in my reply above. I call it "non-parametric" because there is no assumed "formula" for distribution: just probability values for d in range(n_durations)
. As for choosing the $\lambda$, I think it should be learned during the fit
function, but I am still studying how to do it (along with other distributions like Negative Binomial, Geometric, etc.).
Ask questions for clarifications. Thanks!
@poypoyan Thank you very much! :)
Oh sorry for late reply huhuhu
In my code, "maximum segment length" is the
n_durations
variable.The duration probabilities are stored in a 2D array with
n_states
rows andn_durations
columns. Each entry is thep_j(d)
, the probability that statej
lasts at durationd
. During the parameter estimation (thefit
function), for a state that corresponds to some smaller segment thann_durations
, we should expect to seep_j(d*)
peaking at somed*
<n_durations
(and practically-zeroes at durations nearn_durations
). That's why we could learn states of smaller segments.Hence, in the code, treat the "maximum segment length"/
n_durations
as a "memory limit" variable. Ideally, we should detect segments of any length, but for EM algorithm at least, that means looping up to infinity. So there's a trade-off: you may setn_durations
as high as you want (maybe equal to the length of the training data), but of course it will be slower and takes a lot of memory.My code is still under development. I am still studying some functionalities to add (as written in the readme). Thanks!