Integrated Laplace approximation

charlesm93 commented 1 month ago

Description

Provide basic support for an integrated Laplace approximation.

The motivation is to handle hierarchical models with a latent Gaussian model of the form $\eta \sim p(\eta)$ $\theta \sim \text{normal}(0, K(\eta))$ $y \sim p(y \mid \theta, \eta)$. We construct a Laplace approximation of $p(\theta \mid \eta, y)$ by matching the mode and curvature, and then obtain an approximation of the log marginal likelihood $\log p(y \mid \eta)$.

A call to the function may look as follows:

target += laplace_marginal(theta0, ll_fn, ..., K_fn, ...);

where

theta0 initializes the solver that finds the mode of $\log p(\theta \mid \eta, y)$
ll_fn is a user specified function that returns the log likelihood $\log p(y \mid \eta, \theta)$.
... is a first set of variadic arguments to be passed to ll_fn.
K_fn is a user specified function that returns the prior covariance matrix.
... is a second set of variadic arguments to be passed to K_fn.

We can also give the user control over the underlying Newton solver.

target += laplace_marginal_tol(theta0, tolerance, max_num_steps, hessian_block_size, solver, max_step_linesearch, ll_fn, ..., K_fn, ...);

The additional arguments are:

tolerance: the difference between two successive evaluations of $\log p(\theta \mid \eta, y)$ after which convergence is considered to be achieved.
max_num_steps: the number of steps after which the solver gives up.
hessian_block_size: typically the Hessian of $\log p(y \mid \theta, \eta)$ with respect to $\theta$ is block diagonal and this can be taken advantage to speed up computation.
solver: for log concave likelihoods, use 1. In other cases, variations of the Newton solver can be attempted using 2 and 3. See references for details.
max_step_linesearch: number of linesearch steps at each Newton iteration. If set to 0, no linesearch happens.

Going the other way, we can also provide a simplified interface where the likelihood is not user specified but set ahead of time.

target += laplace_marginal_poisson_log(theta0, *, K_fn, ...);

where * is a place-holder for arguments specific to the Poisson likelihood with a log link.

Next, we need to specify an rng function to recover draws from $p(\theta \mid \eta, y)$. In practice, we may also want "out-of-sample" draws, in which case we would evaluate $\theta$ for a different set of values passed to $K$ (think GP process). The call for the functions would be:

vector[nObs] theta = laplace_marginal_rng(theta0, ll_fn, ..., K_fn, ...);

vector[nObs_pred] theta_pred = laplace_marginal_pred_rng(theta0, ll_fn, ..., K_fn, ..., ...);

where for the second function, we pass in two sets of inputs for K_fn. As before, we can wrap these functions for a specific likelihood, and also give users control over the underlying Newton solver.

One thing to note is that under-the-hood, the autodiff uses higher-order derivatives and so the fwd mode must be supported.

There may be other features worth supporting, but for now, we can start here (and even split this into multiple PRs).

This issue replaces #755.

References

A detailed description of the algorithms: https://arxiv.org/abs/2306.14976
A paper demonstrating applications: https://proceedings.neurips.cc/paper/2020/hash/673de96b04fa3adcae1aacda704217ef-Abstract.html
A notebook showcasing existing prototype code: https://htmlpreview.github.io/?https://github.com/charlesm93/StanCon2020/blob/master/notebook-2022/lgm_stan.html#inst

Current Version:

v4.8.1

charlesm93 commented 1 month ago

@SteveBronder @WardBrian @avehtari

avehtari commented 1 month ago

Next, we need to specify an rng function to recover draws from . In practice, we may also want "out-of-sample" draws, in which case we would evaluate for a different set of values passed to (think GP process). The call for the functions would be:

It would be good to clarify that here the prediction is not for $y$

There may be other features worth supporting, but for now, we can start here (and even split this into multiple PRs).

It would be useful to have a function returning the mean and sd, so that these not need to be estimated from rngs

tuple(vector, vector) theta_mu_sigma = laplace_marginal_mean_sd(...)

and a corresponding one for out-of-sample

dpsimpson commented 1 month ago

So excited to see this happening again

stan-dev / math