Feedback April 28 - Githubissues

Authors

[x] Add Mahika's name to authors

Title

[x] Change the title to Exploring hidden Markov model. It is singular not plural. Similarly in the subtitle text.

Sequential modeling

[x] Correction: Rewrite as: Hidden Markov..(HMM) is a popular machine learning model for time-series data and is often used for time-series applications like the ones mentioned above.
[x] Add footnotes and citations wherever necessary throughout the paper. For example, cite some top papers talking aboyut HMMs for speech tagging. Cite HMMs for voice assistant, etc. Cite Rabiner's paper on HMM in introduction. Cite the original HMM paper. Cite Murphy's book wherever necessary.
[x] After the I like playing football examples is completed, please add the rain example that is currently there in Example of Markov Chain.

Markov model

[x] In the Markov model section explain the relationship between Markov model and chain.
[x] You can use draw.io images to replace the TikZ one I drew if you can make it similarly preofessional looking and embed all maths. Else, TikZ is fairly straightforward and can do whatever complicated diagrams you need. I would favour TikZ if possible. Especially, the trellis diagrams would like prettier. But, see if you can draw similarly using draw.io
[x] After you have finished the equation for the Markov Chain, draw a nice illustration showing 4 examples of Markov chains: a) Rain-Sunny with states Rain and Sun; b) Unfair coin with states: Biased and Fair; c) Activity showing transition between running, walking and resting; d) Language showing different parts of speech as states. Use this image as an inspiration to add multiple examples: http://karpathy.github.io/assets/rnn/diags.jpeg The idea would be to reuse this figure when introducing HMMs but adding the corresponding observations also.
[x] Parameters of First order .. should be subsection of Markov Model.
[x] The first line in each section and each paragraph should set the tone. The first line in Parameters of first order Markov chain does not do that. This is one of the important writign rigour you must add to your skillset. Instead, write: Let us now understand the parameters for a Markov model, for which we need to revisit the factorisation of Markov model. We rewrite the factorisation of Markov model as: write the factorisation equation as multiplication of P(x1) with P(x_t|x_t-1) over all t. Now, mention the assumption of K states. Now, mention the key insight: we assume P(x_t|x_t-1) is same irrespective of time, i.e. the probability of transitioning is independent of time. Explain in footnotes how this is similar to parameter sharing in RNNs or CNNs. Next explain Transition matrix and Prior probability in 1 line each. Also, use x_ts everywhere and not z_t here.

Markov model sampling

[x] This should be a subsection of Markov Model
[x] Add a caption for each image that you use
[x] If someone changes the parameters in the interactive plot, call reset()
[x] Label the Transition Matrix and Prior Probability table well in the interactive plot. Label Generated state sequence. Label finite state machine view of transition matrix.
[x] Rewrite the text in markov model sampling to make it easier to understand.

Text generation using Markov chains

[x] Are we keeping this section? If yes, make it very concise and to the point and make it a subsection of Markov Model section

HMM section

[x] why is it labeled what is hidden in HMM?
[x] redraw the 4 examples you drew earlier and now add the observations too. for activity modeling the observations are continuous. For others the observations are discrete.
[x] Now explain with the diagram you have drawn. What is hidden? What is observed?
[x] Then, draw the general HMM graphical model.
[x] make this section very concise and example driven.

HMM parameters

[x] this should be a subsection of HMM
[x] Reduce text and mention A and Pi are same as before. Only new parameter is emission probability. Explain this can be discrete as 3/4 examples you show and continuous for the activity example that you show.

HMM sampling

[x] same comments as MM sampling
[x] same comments for the interactive plot as the MM interactive plot.

HMM evidence likelihood

[x] the jump is very sudden. I suggest renaming this section to something like: 5 HMM questions. Then, write: we now look at 5 important questions to understand HMM algorithms.. Then, each question is a subsection.

Model evidence

[x] model evidence is not the same as the most probable set of observations. Please correct. State simply: You want to find the likelihood of an observation sequence given HMM parameters Pi, A, Phi. You seem to be repeating sentences in this paragraph.
[x] time complexity does not need to be a full section. This can be just 2-3 lines. Please redraw this diagram. See how neat Andrej Karapthy's RNN diagram looks. Ofcourse, do not make this diagram vertical. Let us keep this horizontal. Also, here introduce the concept of trellis diagram using a separate diagram before this diagram. Add a caption for this figure. Add Path #1, #2, etc. in the GIF. Also, please relate the text again with coin example. Before this plot, also give 3 observations to contextualize: like H, H, H and then ask how likely is this set of observations?

On the right hand side in this GIF put the main question: P(H, H, H | Z1=B, Z2= B, Z3 = B) = ....P(Z1=B). P(Z2=B|Z1=B).... This way it would be easy to connect the likelihood calculations.

This GIF can be slowed down.

Finally, in the text we need to somehow succinctly say that P(HHH|theta) is the sum of probabilities of the eight paths.

Then, go and generalize to K^T. Simplify the text surrounding T*K^T and maybe just write exponential in K.

[x] We need to do a better job explaining the intuition behind the forward algorithm. I suggest painting this story after telling about the exponential cost in K. Draw the trellis showing Z2 and Z3. We need to basically say that if we know P(HH, Z2=F) and P(HH, Z2=B) irrespective of the path followed (Z1=F, Z2=F) (Z1=B, Z2=F), .., we can now easily predict P(HHH, Z3=F) since we can reach Z3=F from Z2=F or Z2=B. Thus, we need to consider only K transitions at a timestamp.

Then, explain this is what forward algorithm does and introduce the convention. Then see what all text can be deleted. Do the same for backward. Before explaining backward, explain why do we even need it? What HMM problem does it help solve?

[x] images before forwards algorithm are missing currently. Maybe they are not properly linked?

nipunbatra / hmm

Feedback April 28 #2