poypoyan / edhsmm

An(other) implementation of Explicit Duration Hidden Semi-Markov Models in Python 3
MIT License
8 stars 3 forks source link

Duration Distributions #7

Closed poypoyan closed 2 years ago

poypoyan commented 3 years ago

Currently, the duration probabilities per state are stored in a 2D array of shape (n_states, n_durations). There are situations wherein aside from these "non-parametric" duration PMFs per state, the duration needs to be estimated by parametric distributions.

The 'hsmm' R package offers 4 parametric distributions for duration:

I need help on the math behind determining the parameters from non-parametric duration PMF, especially these 4 distributions. Suggest some resources. I prefer clear algorithms, but anything relevant is welcome.

Thanks!

poypoyan commented 2 years ago

There seems to be 2 ways: 1) Parametric estimators (e.g. Maximum Likelihood Estimation (MLE)) Sample: https://www.statsmodels.org/devel/dev/generated/statsmodels.base.model.GenericLikelihoodModel.html Pros: If correct prob. distribution is chosen, estimate has minimum error (see Cramér–Rao lower bound). Cons: You have to guess the "correct" prob. distribution. If incorrect prob. distribution is chosen, very inaccurate.

2) Non-parametric estimators (e.g. Kernel Density Estination (KDE)) Sample: https://github.com/tommyod/KDEpy Pros: You don't have to guess the "correct" prob. distribution. Cons: Less interpretable than MLE. May need more sample size than MLE.

I need to study this more. Closing this for now.