[x] At some places we use modeling and at some we use modelling. Please make it American English everywhere.
[x] Add caption for the Sunny-Rainy timeseries plot. If possible, add a very light translucent gray too to the background. Captio could be: Rainy-Sunny time-series data.
Markov chain
[x] The green colouir on present observation is distracting. Can you choose a more sober colour?
[x] Please add a caption for figure on Markov chain.
[x] For Air conditioner - rather talk about the state of compressor
Paraneters of Markov chain
[x] The first sentence is good, but, repeated many times. Rather, simply say: Each of the discrete values of observation at time $t$ or $x_t$ can take a discrete state. For example, rainy or sunny weather, or, biased or fair coin, or appliance ON or OFF.
Then mention that let us assume that our observation can take one of $K$ states.
[x] We can write the factorisation --> We can rewrite the factorisation of the above general Markov chain as:
[x] Remove the line: is the same as multiplication of ..
[x] Remove the mention of K states now written here as we have already written it above now.
[x] There is a jump after; to be common (shared) across all time. At this point, say: To fully specify the Markov chain, we require the following two parameters:
[x] Write the symbols for A and Pi in bracket. Like, Transition matrix (A).
[x] Make the text in Transition matrix dense. Like: The transition matrix stores the probability of transition between the state i to state j. Thus, the transition matrix can be represented as a KXK matrix where the entry A_ij
[x] Try to reduce the text in Markov chain FSMs by 1-2 lines for each of the three examples. Also, try to use the exact same values of FSM for Fair-Biased as we used in the example later.
[x] There is a bug in Generated Sequence for FSM
[x] in the interactive plot, can you place the text Generated Sequence closer to the generated sequence and away from the FSM?
HMM
[x] Let us be direct in HMM. We can directly say: In an HMM, an observation is generated from a hidden component which is modelled as a Markov chain.
[x] let us be more direct when you introduce a general HMM. The observation at time t (shown in shaded pink) is denoted by xt and the hidden state (unshaded) is denoted by zt.
[x] It is worth noting --> that the hidden component is modelled as a Markov chain and not the obsverations.
HMM parameters
[x] The first sentence has too much bold losing the value of bold. The bold is not needed.
[x] Emission does not need to be conditioned on A and PI. And no need to write t in phi.
[x] No end of sentence after continuous.
[x] The text is repeated again in HMM state diagrams. No need to repeat again.
Sampling from HMM
[x] Can you do some visual tweaks to ensure it works well across screen sizes.
Trellis
[x] It should not be a section. Rather, a small subsection in HMM.
[x] What is B and F? You should instead write: Consider the following trellis diagram showing the possible path sequence of hidden states for three timestamps for the biased-fair coin example. Mention it clearly in the caption also. Mention the highlighted red path corresponds to the state sequence: ...
[x] The trellis diagram looks insanely big in a big screen. Maybe trim the padding at the top.
[x] Since we have discussed about the state sequence earlier, we do not need to discuss it again after the figure caption.
HMM questions
[x] Maybe rename this to HMM algorithms
[ ] The text in HMM evidence likelihood can be made much more terse. We want to estimate the likelihood (L(X|\theta) of a set of observations (X = {x1, ..., xt (AND NOT N)) given the model parameters (\theta = {}). Then say, let us understand this calculation by revisiting our coin toss example. Let us first calculate the probability of observing a sequence {H, H, H} given the state-sequence z1 = B, ....This path is highlighted in red in the trellis diagram below. We can calculate this probability by multiplying: COLOR1(Probability of starting in state B at time t=1), COLOR2(Probability of observing a H from state B at time t = 1), COLOR3(Probability of transitioning from state B at t=1 to state B at t=2)...Then, when you use the maths, color the corresponding probabilites accordingly using COLOR1, ...
[ ] Then continue the story. We have now calculated the probability of observing HHH given state sequence BBB. But, there could be in total 8 (=2^3) such paths. For calculating, the likelihood of observing HHH, we need to sum the probabilities of them being generated by each of the eight paths as shown in the GIF below. Then, draw the GIF. Then, generalise this to K^T. Just contextualise this number. Rabiner's paper has some example: even for a modest value of K and T, this number is more than the number of atoms, ..Then, say, clearly we want to do better computationally. Then say, our key trick is to reuse some computations using dynamic programming via an algorithm called the Forward algorithm. Let us now understand why DP makes sense.
[x] Once you have drawn the tree, be more direct and say we can omit repeating several calculations as shown in the tree below.
[x] In the below tree, you may want to show multiple output connections from say P(H|z1= F) to make it clear we are not computing it directly.
[x] You need to build the link between the tree and alpha_t. Essentially, you want to say that you just need to store the probability of observations till time step t and ending in the given state.
Forward algorithm
[x] Based on my comments, see which text from Forward algorithm is already written above and need not be written again.
[ ] It is not clear from writing how much time Forward algorithm can save.
[ ] The caption for relation between alpha_t and alpha_t+1 is not self-contained. Do mention the key step in forward algorithm: the relation between alpha...
[x] use K and not k when drawing the relation between alpha_t and alpha_t+1 both in text and diagram.
[x] Write the likelihood calculation more directly. Using the forward algorthm, we can compute the Likelihood as \sum..
[x] I suggest cutting most of the text starting from Now, we move on to.. Directly, come to the next problem: Estimating the probability of a sequence (x...) conditioned on an observation of hidden state at time t. Then, give the example of D, D, W, W, W.. Then, break down the
factorisation into alpha and the other term, which you can then say is the probability of future (t+1:T) sequence given the current state and say it is caculated using backwards algorithm.
Backwards algorithm
[x] Similar comments to forward algorithm. Use K instead of k in diagram and text.
Sequential modelling
Markov chain
Paraneters of Markov chain
Then mention that let us assume that our observation can take one of $K$ states.
HMM
HMM parameters
Sampling from HMM
Trellis
HMM questions
Forward algorithm
Backwards algorithm