Closed charlesm93 closed 3 years ago
Tagging @betanalpha.
What is the best way of testing we get the right density? I can work it out analytically, but that would be for a simple case or we can benchmark against another software.
Checks to implement:
log_omegas
are all positive.~Gamma
are all positiveGamma
is a square matrixGamma
sum to 1 [is this too expensive to check?]rho
are positiven_states = log_omega.rows()
matches rho.size()
and Gamma.cols()
.Why is omega
on the log scale but not Gamma
(stochastic matrix) or rho
(initial distribution?).
Will there be a way to set the initial distribution to be the stationary distribution of Gamma
?
No, it won't be too expensive to check that Gamma
is a stochastic matrix and rho
is a simplex; that time will be dwarfed by the rest of the computation of the HMM.
I've always seen the rows of Gamma
be the simplexes, but it appears that there are two conventions for stochastic matrices in the literature (right stochastic and left stochastic). Conventionally HMMs have used the one where rows are simplexes so that Gamma[i, j]
is the probability of transitioning from state i
to state j
. If you're going to do this the other way you'll have to make it very clear that's the convention. And you'll definitely need tests either way.
What is the best way of testing we get the right density?
SBC? Or an approximate form thereof?
You could also write a second implementation of the forward algorithm out for double
valus to make sure you get the right answer. The problem with external software is that we won't be able to turn it into a unit test other than for fixed HMMs.
Why is omega on the log scale but not Gamma (stochastic matrix) or rho (initial distribution?).
Because the Stan language exposes log density functions where as Gamma and rho will be simplices on the linear scale for simpler models.
Will there be a way to set the initial distribution to be the stationary distribution of Gamma?
Because Gamma and rho are customizable the user can compute the eigenvectors of Gamma and set rho accordingly external to the HMM density.
No, it won't be too expensive to check that Gamma is a stochastic matrix and rho is a simplex; that time will be dwarfed by the rest of the computation of the HMM. rho will be cheap, but for time-dependent Gamma the checks could become substantial. For an initial implementation, however, we should be safe and worry about optimizations later.
I've always seen the rows of Gamma be the simplexes, but it appears that there are two conventions for stochastic matrices in the literature (right stochastic and left stochastic). Conventionally HMMs have used the one where rows are simplexes so that Gamma[i, j] is the probability of transitioning from state i to state j. If you're going to do this the other way you'll have to make it very clear that's the convention. And you'll definitely need tests either way.
We were working off of stats/machine learning textbooks that defined Gamma with simplex columns mirroring the linear algebraic action of the transition matrix on the previous state probabilities.
On Jan 30, 2020, at 4:49 PM, Michael Betancourt notifications@github.com wrote:
Why is omega on the log scale but not Gamma (stochastic matrix) or rho (initial distribution?).
Because the Stan language exposes log density functions where as Gamma and rho will be simplices on the linear scale for simpler models.
OK. That makes sense. I made the same decision with log_mix, but have been uncomfortable with that ever since since the implementation for log_mix(alpha, lp1, lp2) is just log_sum_exp(log(alpha) + lp1, log1m(alpha) + lp2), so you always wind up taking the logs anyway. The problem with linear is that it doesn't give you as much dynamic range. But then that's probably not such a big deal if there are vanishingly small mixture components.
Will there be a way to set the initial distribution to be the stationary distribution of Gamma?
Because Gamma and rho are customizable the user can compute the eigenvectors of Gamma and set rho accordingly external to the HMM density.
I think eventually it'd be nice to have a version with one fewer argument that does that automatically. I can do that once the first one's built.
No, it won't be too expensive to check that Gamma is a stochastic matrix and rho is a simplex; that time will be dwarfed by the rest of the computation of the HMM. rho will be cheap, but for time-dependent Gamma the checks could become substantial. For an initial implementation, however, we should be safe and worry about optimizations later.
I completely agree we should check. The plan's to rewrite checks so that they can be turned off wholesale.
I've always seen the rows of Gamma be the simplexes, but it appears that there are two conventions for stochastic matrices in the literature (right stochastic and left stochastic). Conventionally HMMs have used the one where rows are simplexes so that Gamma[i, j] is the probability of transitioning from state i to state j. If you're going to do this the other way you'll have to make it very clear that's the convention. And you'll definitely need tests either way.
We were working off of stats/machine learning textbooks that defined Gamma with simplex columns mirroring the linear algebraic action of the transition matrix on the previous state probabilities.
I'm not sure which texts you're looking at, but Bishop sets them up with rows being simplexes (p. 610). The Qin et al. paper on chemistry I really liked also made the rows simplexes. The Langrock et al. paper "Flexible and practical modeling of animal telemetry data" also sets up the stochastic matrix row-wise (that paper's the basis of the popular moveHMM package in R). It's also the way I set it up in the manual, although that says more about me than the external world. Who sets up the transition matrix with columns as simplexes?
I'm fine changing the conventions for the gammas, and having the rows as simplexes.
For the unit test for the log density evaluation, I'll construct a simple case where there are only two states. It's straightforward to do this calculation analytically.
The more general thing to do would be to test 1-state, 2-state, and 3-state examples to make sure all the boundary conditions on the loops work.
Then tests inputs of size 0, 1, 2, and 3.
You can at least test that finite diffs match autodiff without any analytic work.
On Feb 28, 2020, at 10:54 AM, Charles Margossian notifications@github.com wrote:
I'm fine changing the conventions for the gammas, and having the rows as simplexes.
For the unit test for the log density evaluation, I'll construct a simple case where there are only two states. It's straightforward to do this calculation analytically.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.
I agree with Bob. Having the rows as simplexes for the transition probability matrix is definitely how I'd also prefer to have it set up.
Closed in #1778
Description
Consider a hidden Markov model with observations
y
, a discrete latent statex
and model parameterspsi
. To fit this model in Stan, we need to integratex
out, and then compute gradients for the marginal densityp(y | psi)
. Doing this by hand is lengthy and doesn't guarantee optimal computation of the gradients. Deriving an algorithm for the optimal derivatives is not trivial: it can be done as a byproduct of the EM algorithm or by constructing an adjoint method, a discretized version of what we do with ODEs.I propose to implement an
hmm_marginal_density_lpdf()
function with optimal gradient calculations. In addition there will be two complementary functions:hmm_hidden_state_probs()
which computes the marginal probabilities of the hidden states andhmm_hidden_state_rng()
which simulates these hidden states.Example
x
is discrete and takes values over a finite set with K elements. Suppose we have N observationsy
, corresponding toN
transitions. The user then definesGamma
, with (i, j)th entry being the probability thatx(n) = i
givenx(n - 1) = j
omega
, with for each observation the probability densityp(y | x)
over each possible value ofx
.rho
.The user then calls
Expected Output
This feature will facilitate and optimize the use of hidden markov models in Stan.
Current Version:
v3.1.0