This package contains tools that allow users to automatically generate local sensitivity measures to hyperparameters in Stan using the RStan interface.
In Bayesian analysis, priors and likelihoods typically have hyperparameters that are fixed by the modeler. Naturally, posterior expectations of parameters are functions of these hyperparameters. If a range of hyperparameters are plausible, one hopes that the posterior expectations don't depend too strongly on the particular values of the hyperparameters. A model and dataset is called "robust" when relevant posterior expectations don't vary meaningfully as the hyperparameters vary over their plausible values.
For example, in the normal_censored
stan example, the likelihood has a fixed
value of y_var=1
. If we aren't sure that y_var
should be exactly 1
, we
might hope that the expectation of the parameter mu
doesn't depend too
strongly on the precise value of y_var
. Here is a graph of the actual
dependence (in blue):
Evaluating the exact dependence of posterior expectations on the hyperparameters
typically requires re-fitting the model many times, which can be expensive.
That is what we did to form the above graph. However, it is possible to form a
linear approximation to the dependence (shown in red above) using only samples
at a fixed value of the hyperparameters [1,2,3]. Our package,
rstansensitivity
, estimates this linear approximation. It only needs draws
from the posterior at one fixed value of the hyperparameters (alpha naught in
the figure, which corresponds to y_var=1
in the example).
There are some caveats to be aware of.
rstansensitivity
will give you misleading results. This package is a guide, not substitute for critical thinking about your models!Suppose we have an inference problem with the following quantities:
To perform Bayesian inference, we specify a prior and likelihood which determine the posterior distribution.
In Stan, the parameters are declared in the parameters
block, the data
is declared in the data
block, and hyperparameters are either hard-coded
explicity in the model
block or passed in through the data
block.
In the model
block, the user specifies the prior and likelihood.
Stan uses Markov Chain Monte Carlo to produce approximate draws from the
posterior distribution for a particular choice of the hyperparameters.
Suppose we are interested in the posterior mean of a particular quanity.
Our package uses MCMC draws to estimate the following quantities:
Because changes in posterior expectations are typically only meaningful if they are an appreciable proportion of the posterior standard deviation, the normalized senstivity is often a more meaningful number.
The first step, of course, is to install the package.
library(devtools)
install_github("rgiordan/StanSensitivity", subdir="rstansensitivity")
Once you've done that, there are five steps:
GenerateSensitivityFromModel
on the model with a hyperparameters block. This will generate a few .stan
files, including a model that is equilvant to the original model with hyperparameters in the data block.More details now follow.
Some examples can be found in
examples
, which are based on their namesakes in the
Stan examples. In the detailed instructions below, we will use Stan's negative_binomial
esample -- see examples/negative_binomial/negative_binomial_original.stan
.
A Stan model's data block typically consists of both data and hyperparameters. Starting with an existing Stan model and dataset containing hyperparameters, split your data
block into a data block containing parameters that you want to keep fixed and a new hyperparameters
block containing the parameters whose sensitivity you want to evaluate.
For example, the original data block in negative_binomial_original.stan
was:
data {
int<lower=1> N;
int<lower=0> y[N];
real weights[N];
real cauchy_loc_alpha;
real cauchy_loc_beta;
real cauchy_scale_alpha;
real cauchy_scale_beta;
}
and in negative_binomial.stan
this is split into
data {
int<lower=1> N;
int<lower=0> y[N];
}
hyperparameters {
real weights[N];
real cauchy_loc_alpha;
real cauchy_loc_beta;
real cauchy_scale_alpha;
real cauchy_scale_beta;
}
Note: hyperparameters must be real-valued and unconstrained. There are currently no checks for this -- the sensitivity analysis will simply crash or not make sense! (If there are constraints, it will silently report sensitivity to the unconstrained value, not the constrained value.)
You also need to put the base values of your hyperparameter in the data file. In the negative_binomial
example, you'll want to put lines like cauchy_loc_alpha <- 0
in the data file examples/negative_binomial.data.R
.
GenerateSensitivityFromModel
in R.Stan can't parse a hyperparameters block on its own, so we parse the model from the previous step with a hyperparameters block into a three different specially named .stan
files that Stan can actually interpret. To do this, run (in R):
> library(rstansensitivity)
> model_name <- GenerateSensitivityFromModel("/path/to/your/model.stan")
Here, /path/to/your/model.stan
is the path to the model from the previous step containing a hyperparameters block. For example,
running GenerateSensitivityFromModle("negative_binomial.stan")
produces three models: negative_binomial_generated.stan
,
negative_binomial_sensitivity.stan
, and negative_binomial_parameters.stan
.
The only one you need to use directly is the first one, which is equivalent to the original model, with the hyperparameters in the data block.
The variable model_name
is just the hyperpameters model path with
the .stan
suffix stripped away. It will be used as an argument later so the sensitivity functions can find the generated files.
Run Stan as usual on the *_generated.stan
model from the pervious step.
For example, with the negative_binomial
example we run:
# Compile your model.
model <- stan_model(GetSamplingModelFilename(model_name))
# Load the data.
stan_data <- new.env()
source("negative_binomial.data.R", local=stan_data)
stan_data <- as.list(stan_data)
# Run the sampler.
sampling_result <- sampling(model, data=stan_data, chains=3, iter=3000)
Using the output up to this point, run a few more R commands:
stan_sensitivity_model <- GetStanSensitivityModel(model_name, stan_data)
sens_result <- GetStanSensitivityFromModelFit(sampling_result, stan_sensitivity_model)
tidy_results <- GetTidyResult(sens_result)
PlotSensitivities(tidy_results)
GetStanSensitivityModel
parses and compiles some of the generated models.
Like any Stan model, this will take some time the first time it's run, through
if you set rstan_options(auto_write=TRUE)
the compiled model will be cached.
GetStanSensitivityFromModelFit
evaluates the linear approximation using MCMC
draws from the original Stan run. GetTidyResult
packs the output into a
tidy dataframe. PlotSensitivities
then hopefully makes a graph like this:
Negative binomial example
Re-run Stan to confirm your conclusions! Don't skip this step --
rstansensitivity
is hopefully a useful guide, but it's not magic.
Remember that linear approximations are only approximations!
In order to interpret the output to determine whether you may have a robustness problem, you need to decide three things:
In general, every situation may admit different answers to these questions, and we do not make an attempt to answer them automatically.
To help with (1), we do offer the option of reporting the sensitivity divded by the posterior standard deviation for each parameter. We call this "normalized sensitivity":
If you've decided on answers to (1) and (2), and are willing to at least tentatively that the answer to (3) is "yes", then you can use the linear approximation to determine whether the range of hyperparameters can cause the expectation to change by unacceptably large amounts.
In the figure "Negative binomial example" above, suppose we had decided that cauchy_loc_alpha
might vary from -4 to 4, and that a change of any parameter greater than one posterior standard deviation would be a problem. This would occur if any parameter had a normalized sensitivity greater than in absolute value than 1 / 4 = 0.25. However, the most sensitive parameter to cauchy_loc_alpha
is alpha
, and its normalized sensitivity is very likely less than 0.05 in magnitude. So we would decide that sensitivity to cauchy_loc_alpha
is not a problem -- as long as we believe that the dependence of all the expectations are linear in cauchy_loc_alpha
between -4 and 4.
For a more in-depth discussion of the relationship between sensitivity and robustness, see Appendix C of our paper [3].
We also provide some helper functions for the Bayesian infinitesimal jackknife (IJ). Formally the IJ corresponds to assigning data weights and forming a linear approximation to the depdendence of a posterior expectation on these weights as above. Since the log likelihood is linear in the weights, the partial derivative of the log joint distribution with respect to the weights is simply the log likelihood itself. It is typically easier to simply evaluate this log likelihood directly rather than use our framework to compute these derivatives.
An example use case is estimating the frequentist covariance of a posterior
expectation to check for misspecification. We will demonstrate
the idea with the following example, taken from the unit test
test_ij.R
. Let's generate some heteroskedastic regression data and fit
it with an rstanarm
model that assumes homoskedasticity.
library(rstansensitivity)
library(rstanarm)
library(sandwich)
num_sims <- 2000
x1 <- rnorm(num_sims)
x2 <- rnorm(num_sims)
sigma2 <- 3 * (x1^2 + x2^2)
eps <- rnorm(num_sims, sd=sqrt(sigma2))
df <- data.frame(y=eps, x1=x1, x2=x2)
rstan_fit <- rstanarm::stan_glm(y ~ 1 + x1 + x2, df, family=gaussian())
param_draws <- as.matrix(rstan_fit, pars=c("x1", "x2"))
bayes_cov <- cov(param_draws, param_draws)
Since the model is misspecified, we expect the frequentist variance of the expectation of the regression coefficients to be different from the posterior variance of the same coefficients. Let's check that with the IJ covariance.
loglik_draws <- log_lik(rstan_fit)
ij_cov <- ComputeIJCovariance(loglik_draws, param_draws)
print(ij_cov)
# x1 x2
# x1 10.71141060 0.08694545
# x2 0.08694545 10.19563723
print(num_sims * bayes_cov)
# x1 x2
# x1 5.655012898 0.007550377
# x2 0.007550377 5.625837549
The IJ covariance, which is a consistent estimator of the frequentist covariance of the posterior expectations, differs from the Bayesian posterior covariance due to the misspecification.
To compute Monte Carlo standard errors of the IJ covariance, we can
use GetBlockBootstrapStandardErrors
.
ij_se_list <- GetBlockBootstrapStandardErrors(
loglik_draws, param_draws, num_blocks=100, num_draws=50,
cov_fun=ComputeIJCovariance,
show_progress_bar=TRUE)
Then ij_se_list$cov_se
contains element-wise estimates of the Monte Carlo
standard errors of ij_cov
.
At this time, we do not recommend formally using Monte Carlo standard errors to test whether the Bayesian covariance and IJ covariance are equal, since the two can easily differ by more than the Monte Carlo standard error even when the model is correctly specified and the amount of data is quite large. Rather, we recommend looking at the relative magnitude of the implied variances and covariances. If the IJ covariance is different from the Bayes covariance by an amount that is practically relevant, and not just statistically significant, then you might worry about misspecification.
[1]: Local posterior robustness with parametric priors: Maximum and average sensitivity, Basu, Jammalamadaka, Rao, Liu (1996)
[2]: Local sensitivity of posterior expectations, Gustafson (1996)
[3]: Covariances, Robustness, and Variational Bayes, Giordano, Borderick, Jordan (2017)