helske / seqHMM

Multivariate and Multichannel Discrete Hidden Markov Models for Categorical Sequences
96 stars 30 forks source link

standard errors for HMM parameters #61

Open tbeason opened 2 years ago

tbeason commented 2 years ago

I see that it is possible get standard errors for covariates in MHMM models.

I am wondering if it is possible to get standard errors for the transition probabilities and emission probabilities in simple HMM models? Or even some type of confidence interval?

helske commented 2 years ago

In theory, you could get some asymptotic standard error estimates from the Hessian used in the numerical optimization (local_step = TRUE). However, this is at least currently not supported and would need some work as the nloptr used for the numerical optimization does not return the Hessian. Of course, "manually" estimating the model with optim and logLik functions with hessian=TRUE is possible.

If you have a reasonable amount of sequences, you could compute nonparametric bootstrap estimates though.

tbeason commented 2 years ago

I do have a lot of sequences. In the lowest case, around 50,000. In the largest case, around 1 million. How would I go about doing the bootstrap? I am familiar with bootstrap methods in general, but this is my first trip into HMM methods so I am open to suggestions. My first thought would be to randomly select entire sequences (with replacement), so that the original sequence remains intact but the sample composition becomes random.

Sidenote: The parallelization works well! Estimation is not too slow even with a large number of sequences when using 64 cores. Thanks for that!

helske commented 2 years ago

Yes, your strategy of sampling randomly entire sequences sounds right. In order to avoid potential issues with multiple (local) optima (as well as in order to speed the bootstrap), I suggest you use the estimated parameters as initial values in the bootstrap loop, i.e. you have your estimated model based on the original data, say mod, and then you define boot_model <- build_hmm(boot_sequences, transition_probs = mod$transition_probs, emission_probs = mod$emission_probs, initial_probs = mod$initial_probs).