Open gdalle opened 2 months ago
Making a second pass on the documentation, now that the errors have been fixed (the remarks above still apply).
[!IMPORTant] I feel like I'm missing some documentation that is neither tutorial nor the full API reference. A kind of basic map to the different components of the package, how they fit together, and where to find the tools that I need. The Divio guide is a good resource to understand this distinction: I'm looking for something more explanatory and less demonstrative or exhaustive.
In nearly all tutorials, plots need more descriptions and axis labeling, especially when none of the axes represents time or iterations.
casino HMM: Learning
As you can see, stochastic gradient descent converges much more quickly that full-batch gradient descent in this example. Intuitively, that’s because SGD takes multiple steps per epoch (i.e. each complete sweep through the dataset), whereas full-batch gradient descent takes only one. The algorithms appear to have converged, but have they learned the correct parameters? Let’s see… ... Ok, but not perfect!
The parameters learned by gradient descent are actually... pretty bad? The second row of the transition matrix is way off, and the loadedness of one of the dies goes basically undetected. Is it just a bad initialization?
Tracking an object using the Kalman filter
Sample some data from the model
It would be good to indicate the flow of time in this plot, which is not the horizontal axis.
Perform online filtering / Perform offline smoothing
What are the black circles? These plots need more explanations and not just code.
Online linear regression using Kalman filtering
The final plot needs more explanations with words.
Parallel filtering and smoothing in an LG-SSM
I only just noticed that non-colorblind people probably see both serial and parallel curves superposed due to the different colors? But as a colorblind person I can't differentiate between red and green here. So maybe you could shift one of the curves slightly, to highlight that they follow each other?
MAP parameter estimation for an LG-SSM using EM and SGD
Data
What does the plot represent?
Bayesian parameter estimation for an LG-SSM using HMC
Call HMC
This example errors. Perhaps an insufficiently tight version bound on blackjax
?
TypeError: kernel() got an unexpected keyword argument 'num_steps'
Tracking a spiraling object using the extended / unscented Kalman filter
We now show how to do this using a simple nonlinear Gaussian SSM, combined with various extensions of the Kalman filter algorithm.
What is the difference between these variants (e.g. unscented vs extended Kalman filter)? How does the user choose between them?
$q_t \in R^2$ [...] $r_t \in N(0, R)$
It would be clearer with different fonts for the set of real numbers $\mathbb{R}$ and the covariance matrix $R$.
Extended Kalman filter
Making one of the two lines dotted would help with readability. As would making the larger circles red instead of black.
Unscented Kalman filter
What changed between the two plots? How come the curves are suddenly superposed? We're missing a few explanations here.
Tracking a 1d pendulum using Extended / Unscented Kalman filter/ smoother
Sample data and plot it
Why does it look like the measurements are thresholded instead of just randomly distributed around the true angle?
Online learning for an MLP using extended Kalman filtering
Plot results
Again, the plots need more explanations, especially when the code is so verbose. What are the axes? What are the transparent curves, and the solid blue one? What are the dots?
Skipped for now
Hi and congrats on the package!
I'm one of the reviewers for the JOSS paper you submitted, so here I'll list my questions and concerns about the documentation. This issue will be updated as my reading progresses so maybe don't start answering right away.
Home page
This sentence was unclear to me: I still have to pass some model parameters to the smoother, right?
HMMs
Casino HMM: Inference
Overall really good and clear!
Typo (initializeD).
Casino HMM: Learning
How would you handle data with several trajectories of varying lengths? Do you have to pad them into a 3D tensor, and then apply some kind of mask?
How do you perform gradient ascent on the stochastic matrix $A$? Some kind of projection step? It seems far from obvious.
Also the example errors:
The notion of "epoch" doesn't seem standard for EM, do you take it to mean one E step + one M step?
It would be interesting to explain why the asymptotic GD estimate is worse. Theoretically the finite size of the training data should also affect the EM algorithm, both are doing ERM on the loglikelihood.
This example errors.
Gaussian HMM: Cross-validation and model selection
This example errors:
Also, in the "True HMM emission distribution" plot, it may not be obvious what the black lines stand for (transitions I assume).
It would be nice to add a comment explaining why the EM log prob can end up above the one of the true model.
Is it spherical, diagonal or generic?
AutoRegressive HMM demo
Very nice plots but not enough explanations.
Can you clarify what these plots represent? Adding titles / color legends would also help.
This example errors and the plot below (while a very pretty starfish) should also be explained.
Maybe just specify that the stationary point is the limit of the recursion $y{t} = Ay{t-1} + b$ (not necessarily trivial for every reader).
Please annotate the plots.
Linear Gaussian SSMs
Tracking an object using the Kalman filter
Why do we need an RNG to initialize the model even though all parameters are already fixed? This may also have been a relevant question for previous notebooks but I just noticed it now.
This example errors.
Online linear regression using Kalman filtering
Conceptually, do we pay a performance penalty by doing this with an SSM-inspired formulation?
It would be useful to remind the reader of the two equations defining LG-SSM in matrix form.
This example errors.
What is the difference between $w_0$ batch and $w_0$?
Parallel filtering and smoothing in an LG-SSM
parallel_smoother
and just usingjax.vmap
on the normal smoother?This example errors.
Also, are we supposed to see a difference between serial and parallel filtering on the plot? Obviously we expect both curves to be superposed but it is still a bit weird to have both in the legend and only see one.
MAP parameter estimation for an LG-SSM using EM and SGD
This example errors.
I don't understand what is going on in this plot. In particular, how you predict emissions from smoothing. When you predict, by definition you don't have access to observations beyond $t$. Shouldn't you predict from filtering instead?
Bayesian parameter estimation for an LG-SSM using HMC
This example errors.
Also I still don't understand what is meant by "smoothed emissions" (same problem as with the previous notebook), to me the only thing you can smooth is a state.
An introduction to what HMC is and what you use to implement it (apparently blackjax) would be very useful here.
X and Y labels are wrong in the plot. Also, are we supposed to observe that the log probability increases? If so, the blue curve is not very convincing.
Same remarks about the blue curve.
Related issues:
377