maxmouchet / HMMBase.jl

Hidden Markov Models for Julia.
MIT License
94 stars 12 forks source link

HMM with observations as probabilities #21

Closed AndreasAZiegler closed 3 years ago

AndreasAZiegler commented 4 years ago

I would like to solve exercise c) and d) of the attached problem with HMMBase.jl. I'm quite new to HMM but as I understand, HMMBase needs some observation values to perform the forward, backward and viterbi-algorithm whereas the problem only provides probabilities. Therefore it seems I can't solve this problem with HMMBase. Please let me know if I misunderstand the problem and/or the usage of HMMBase.jl

hmm_example.pdf

maxmouchet commented 4 years ago

Hi,

In your case, the emission matrix O gives the probabilities P(Y_t = k | Z_t = i) = O_{ik}. To model that using Distributions.jl you can use a Categorical distribution (one for each state). It is parametrized by a probability vector where element i is the probability of symbol i.

See the examples below. Hope this helps!

Maxime

julia> using Distributions

help?> Categorical
search: Categorical ncategories

  Categorical(p)

  A Categorical distribution is parameterized by a probability vector p (of length K).

P(X = k) = p[k]  \quad \text{for } k = 1, 2, \ldots, K.
using Distributions
using HMMBase

# Initial Probabilities
a = [0.3, 0.2, 0.5]

# Transition Matrix
A = [0 0.2 0.8; 1 0 0; 0.4 0.4 0.2]

# Observations Distributions (for each state)
B1 = Categorical([0, 0.5, 0.5])
B2 = Categorical([1, 0, 0])
B3 = Categorical([0.1, 0.2, 0.7])

# Example: probability of observing O3 in state 1
@show pdf(B1, 3)
# pdf(B1, 3) = 0.5

# Build the HMM
hmm = HMM(a, A, [B1, B2, B3])

# Observations
Yt = [1, 3, 1]

# Question c)
# `loglikelihood` uses the forward algorithm to compute the sequence probability.
@show exp(loglikelihood(hmm, Yt))
# exp(loglikelihood(hmm, Yt)) = 0.03374000000000003

# Question d)
@show viterbi(hmm, Yt)
# viterbi(hmm, Yt) = [2, 1, 2]
AndreasAZiegler commented 4 years ago

Thanks a lot for your quick response and the example. It would have taken me hours to figure this out on my own.

The output of the viterbi algorithms comes to the same results as I came by hand calculation (the solution got the second state wrong).

However, by using forward, backward and (log)likelihood(s) I don't manage to come to the same numbers as I got with my hand calculation (here I think the solutions did not make any mistake). Can you guess where the difference comes from?

maxmouchet commented 4 years ago

Sorry I had not seen the correction in the pdf.

In my example I used the transition matrix provided in question a) without verifying. But indeed it does not correspond to the transition diagram (P_{12} should be 0.8 not 0.2, there should be no self-transition in S3, etc.).

If we use the transition matrix that correspond to the diagram (the one provided in the correction), we get the right results :)

I will also document this use case (HMM from emission matrix) to make it easier for the users.

using Distributions
using HMMBase

# Initial Probabilities
a = [0.3, 0.2, 0.5]

# Transition Matrix
A = [0 0.8 0.2; 0.6 0 0.4; 0.0 1.0 0.0]
# ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

# Observations Distributions (for each state)
B1 = Categorical([0, 0.5, 0.5])
B2 = Categorical([1, 0, 0])
B3 = Categorical([0.1, 0.2, 0.7])

# Build the HMM
hmm = HMM(a, A, [B1, B2, B3])

# Observations
Yt = [1, 3, 1]

# Question c)
@show exp(loglikelihood(hmm, Yt))
# exp(loglikelihood(hmm, Yt)) = 0.10519999999999999 ≈ 0.1052

# Question d)
@show viterbi(hmm, Yt)
# viterbi(hmm, Yt) = [2, 3, 2]
maxmouchet commented 4 years ago

Also, if you get different results in the forward/backward algorithm, this is because in HMMBase (and most forward/backward implementations), we scale the values so that they sum to 1 (for numerical stability).

forward(hmm, Yt)[1]
# 3×3 Array{Float64,2}:
#  0.0       0.8       0.2
#  0.517241  0.0       0.482759
#  0.0       0.988593  0.0114068
# Correction is unscaled. HMMBase is scaled.
# t = 1
# Unscaled: 0 0.2      0.05      (sum = 0.25)
# Scaled:   0 0.2/0.25 0.05/0.25
#      =    0 0.8      0.2

# t = 2
# Unscaled: 0.06       0 0.056      (sum = 0.116)
# Scaled:   0.06/0.116 0 0.056/0.116
#      =    0.517241   0 0.482759
# ...
AndreasAZiegler commented 4 years ago

Thanks again for the thorough explanation. I also read that scaling is important to avoid numerical issues especially in larger HMM. As you state that HMMBase also scales, I assume there is no way to recover the unscaled values, or is there?

maxmouchet commented 4 years ago

There is currently no way. I'll try to add this possibility in a future version, at-least for educational purpose.

maxmouchet commented 4 years ago

I've added a new constructor so that you can build an HMM directly from an emission matrix (instead of building categorical distributions by hand).

Using your example:

a = [0.3, 0.2, 0.5]
A = [0 0.8 0.2; 0.6 0 0.4; 0.0 1.0 0.0]
B = [0 0.5 0.5; 1.0 0 0; 0.1 0.2 0.7]
hmm = HMM(a, A, B)
AndreasAZiegler commented 4 years ago

Wow, pretty cool. That definitively saves some typing.

AndreasAZiegler commented 4 years ago

@maxmouchet I calculated the viterbi calculations by hand and don't get the same at t=2 as your implementation. The implementation takes state 3 with p=0.056 over state 1 with p=0.06. Is there some numerical issue or do i misunderstand the viterbi algorithm?

equation

maxmouchet commented 4 years ago

In the Viterbi algorithm the state sequence is inferred backwards, starting from the last state. Your computation for V_1 is correct but for V_2 you also need to record the ancestors. That is, argmax_i P_{ij} V_1(i), or more generally, A[t,j] = argmax_i P_{ij} V_{t-1}(i).

For V_2(1), P_{i1} V_1(i) is maximized when i = 2, so the ancestor is 2. For V_2(2), P_{i2} V_1(i) is maximized when i = 3, so the ancestor is 3. For V_2(3), P_{i3} V_1(i) is maximized when i = 2, so the ancestor is 2. And so on for V_3(.).

At the end you should get the following ancestors:

A[2,1] = 2
A[2,2] = 3
A[2,3] = 2
A[3,1] = _
A[3,2] = 3
A[3,3] = 2

Finally, to find the state sequence, we start from z_T = argmax_j V_T(j), and z_t = A[t+1, z_{t+1}]. Here z_3 = 2, z_2 = A[3,2] = 3 and z_1 = A[2,3] = 2.

AndreasAZiegler commented 4 years ago

Thanks a lot for the explanation. Your way of explaining it is way better than in the textbook, I have.

maxmouchet commented 4 years ago

A good introduction to HMMs is https://www.ece.ucsb.edu/Faculty/Rabiner/ece259/Reprints/tutorial%20on%20hmm%20and%20applications.pdf.

maxmouchet commented 3 years ago

I close this issue for now. Feel free to reopen if you have another question.