Closed bob-carpenter closed 2 years ago
This is not a duplicate of #2522, which is about specific functional forms. This one is black-box.
The relevant RStan code we will follow is:
if (draws > 0 && is.matrix(R <- try(chol(-H)))) {
K <- ncol(R)
R_inv <- backsolve(R, diag(K))
Z <- matrix(rnorm(K * draws), K, draws)
theta_tilde <- t(theta + R_inv %*% Z)
Converting this to mathematical notation, first suppose that the mode is $\widehat{\theta}$ and $H(\widehat{\theta})$ is the Hessian at the mode calculated with finite differences. Then comes the high constant-overhead part, where we (once) set
$R^{-1} = \textrm{chol}(-H(\widehat{\theta})) \backslash \mathbf{1}$.
Then we can generate each draw $m$ on the unconstrained scale by sampling
$\theta^{\textrm{std}(m)} \sim \textrm{normal}(0, \textrm{I})$
and defining draw $m$ to be
$\theta^{(m)} = \widehat{\theta} + R^{-1} \cdot \theta^{\textrm{std}(m)}$
We then just need to inverse transform each $\theta^{(m)}$ back to the constrained scale, compute transformed parameters, and generate the generated quantities using the Stan model method write_array()
.
[GitHub hasn't sorted rendering for LaTeX properly, so it still looks like a ransom note!]
I missed this earlier.
This will allow cmdstanr to reproduce the Laplace approximation functionality of RStan.
Currently, RStan is doing a wrong thing, that is, not using the Jacobian adjustment. See illustration of RStan approach and correct Laplace approximation in https://avehtari.github.io/casestudies/Jacobian/jacobian.html
The following signature should work. It requires an estimate theta of the mode (for example, as computed by optimization).
optimization is not using the Jacobian, and thus the estimate computed by optimization is not useful (see the notebook linked above). We want the mode of the Jacobian adjusted target in the unconstrained space.
Furthermore, it would be useful to be able to have the possibility to get the computed Hessian instead of sample
Thanks, @avehtari. You're in luck. I just made a PR to add the Jacobian-adjusted optimizer, so we can get proper Laplace approximations over the Bayesian posterior. So this should all be in the next release of CmdStan and hopefully CmdStanPy and CmdStanR. So now we can get MLE plus std error approximation without Jacobian and MAP plus approximate posteriors in the Bayesian case.
Summary:
Provide a service function for sampling from a Laplace approximation. This will allow
cmdstanr
to reproduce the Laplace approximation functionality ofRStan
.Description:
The following signature should work. It requires an estimate
theta
of the mode (for example, as computed by optimization).The computation will follow RStan. The Hessian will be computed using central finite differences of the model functor. It cannot use autodiff because not all of our functions have second-order derivatives.
central finite differences Hessian functional:
stan::math::finite_diff_hessian_auto
instan/math/rev/functor/finite_diff_hessian_auto.hpp
model functor:
stan::model::model_functional
instan/model/model_functional.hpp
The algorithm has a high constant overhead, requiring a one-time cost of $\mathcal{O}(N^3)$ in $N$ dimensions to compute the Cholesky factor of the negative inverse Hessian. It also requires $2N$ gradient calculations to use as the basis, which scales at best as $\mathcal{O}(N^2)$ and is worse for models whose gradient calculation is super-linear in dimension.
The algorithm also a high per-draw overhead, requiring $N$ standard normal pseudorandom numbers and $\mathcal{O}(N^2)$ per draw (to multiply by Chokesky factor).
For $M$ draws, the total cost is proportional to $\mathcal{O}(N^3 + M \cdot N^2)$. It generates output of size $\mathcal{O}(M \cdot N)$.
Current Version:
v2.30.0