Laplace approximation sampler

bob-carpenter commented 2 years ago

Summary:

Provide a service function for sampling from a Laplace approximation. This will allow cmdstanr to reproduce the Laplace approximation functionality of RStan.

Description:

The following signature should work. It requires an estimate theta of the mode (for example, as computed by optimization).

template <class Model>
int laplace_sample(const Model& m,
  bool include_jacobian,
  const Eigen::VectorXd& theta_hat, 
  int num_draws,
  unsigned int random_seed,
  int refresh,
  callbacks::interrupt& interrupt,
  callbacks::logger& logger,
  callbacks::writer& sample_writer);

The computation will follow RStan. The Hessian will be computed using central finite differences of the model functor. It cannot use autodiff because not all of our functions have second-order derivatives.

central finite differences Hessian functional: stan::math::finite_diff_hessian_auto in stan/math/rev/functor/finite_diff_hessian_auto.hpp
model functor: stan::model::model_functional in stan/model/model_functional.hpp

The algorithm has a high constant overhead, requiring a one-time cost of $\mathcal{O}(N^3)$ in $N$ dimensions to compute the Cholesky factor of the negative inverse Hessian. It also requires $2N$ gradient calculations to use as the basis, which scales at best as $\mathcal{O}(N^2)$ and is worse for models whose gradient calculation is super-linear in dimension.

The algorithm also a high per-draw overhead, requiring $N$ standard normal pseudorandom numbers and $\mathcal{O}(N^2)$ per draw (to multiply by Chokesky factor).

For $M$ draws, the total cost is proportional to $\mathcal{O}(N^3 + M \cdot N^2)$. It generates output of size $\mathcal{O}(M \cdot N)$.

Current Version:

v2.30.0

bob-carpenter commented 2 years ago

This is not a duplicate of #2522, which is about specific functional forms. This one is black-box.

bob-carpenter commented 2 years ago

The relevant RStan code we will follow is:

if (draws > 0 && is.matrix(R <- try(chol(-H)))) {
  K <- ncol(R)
  R_inv <- backsolve(R, diag(K))
  Z <- matrix(rnorm(K * draws), K, draws)
  theta_tilde <- t(theta + R_inv %*% Z)

Converting this to mathematical notation, first suppose that the mode is $\widehat{\theta}$ and $H(\widehat{\theta})$ is the Hessian at the mode calculated with finite differences. Then comes the high constant-overhead part, where we (once) set

$R^{-1} = \textrm{chol}(-H(\widehat{\theta})) \backslash \mathbf{1}$.

Then we can generate each draw $m$ on the unconstrained scale by sampling

$\theta^{\textrm{std}(m)} \sim \textrm{normal}(0, \textrm{I})$

and defining draw $m$ to be

$\theta^{(m)} = \widehat{\theta} + R^{-1} \cdot \theta^{\textrm{std}(m)}$

We then just need to inverse transform each $\theta^{(m)}$ back to the constrained scale, compute transformed parameters, and generate the generated quantities using the Stan model method write_array().

[GitHub hasn't sorted rendering for LaTeX properly, so it still looks like a ransom note!]

avehtari commented 1 year ago

I missed this earlier.

This will allow cmdstanr to reproduce the Laplace approximation functionality of RStan.

Currently, RStan is doing a wrong thing, that is, not using the Jacobian adjustment. See illustration of RStan approach and correct Laplace approximation in https://avehtari.github.io/casestudies/Jacobian/jacobian.html

The following signature should work. It requires an estimate theta of the mode (for example, as computed by optimization).

optimization is not using the Jacobian, and thus the estimate computed by optimization is not useful (see the notebook linked above). We want the mode of the Jacobian adjusted target in the unconstrained space.

Furthermore, it would be useful to be able to have the possibility to get the computed Hessian instead of sample

bob-carpenter commented 1 year ago

Thanks, @avehtari. You're in luck. I just made a PR to add the Jacobian-adjusted optimizer, so we can get proper Laplace approximations over the Bayesian posterior. So this should all be in the next release of CmdStan and hopefully CmdStanPy and CmdStanR. So now we can get MLE plus std error approximation without Jacobian and MAP plus approximate posteriors in the Bayesian case.

stan-dev / stan