bootstrap approximation

hyunjimoon commented 3 years ago

Bootstrapped synthetic likelihood could prevent the repeated N times of fitting; resampling from one set of recovered parameters with some estimated variance could be used to approximate SBC_vanila's computation. The main idea is to get a better mixing over y. Instead of drawing N (at least 1000) sets of \tilde{y} from each \tilde{\theta} as in SBC_vanila, draw one MCMC sample for \tilde{\theta}, and then bootstrap and IJ. Our target is to show that rank statistic result of N sets of \theta in SBC_vanila is similar to that of N_theta Ny sets. Few draws from the prior and lots of draw from $tilde{y}$. N_theta(low) N_y(low) may be equal or smaller than N but as (d) replaces heavy fitting process, considerable speedup advantage is expected.

SBC_approx_bootstrap:
for (i in 1:N_theta) {
  Draw \tilde{\theta} \sim p(\tilde{\theta})
  Draw \tilde{y} \sim p(\tilde{y} | \tilde{\theta})
  for (i in 1:N_y) {
    Draw \tilde{y}^* \sim Bootstrap(\tilde{y})
    Use IJ to approximate  \theta \sim p(\theta | \tilde{y}^*)
  }
}

The size of \tilde{y}^* should be the same as the size of \tilde{y}. The statistic that is being approximated is the posterior summary statistic that you are using to compare the final output to the original prior; rank statistics in our case.

This is cheap compared to producing M independent summaries from the model, when the simulator is computationally intensive. This discusses bootstrap in ABC context which I believe could be equally applied in our setting as requirements for ABC to be feasible, SBC to have high power, and this approximation to have a chance shares the vein.

references

The Jackknife and the Bootstrap, Jun Shao (more technical, recommended)
An Introduction to the Bootstrap, Efron and Tibshirani
The first chapter The Bootstrap and the Edgeworth Expansion, Peter Hall (interesting)
Berekey's Data8 course materials

hyunjimoon commented 3 years ago

These are some summary of tweaks.

BS1: BS mean(#: N_theta) & prior samples rank comparison BS2: BS mean(#: N_theta) & prior distribution comparison BS3: BS(#: N_thetaN_y) samples & prior distribution comparison IJ1: assume theta ~ N_data cov(loglik_draws, param_draws) [guess not]

IJ1.1: IJ approx in N_theta for loop
IJ1.2: IJ approx in N_theta and N_y for loop IJ2: assume known mean, estimate var with ij_cov

image.png

Check Needed.

Boostraped \tilde{y}^* could be regarded as \tilde{\theta} independent. (line 2, 5)
multiple \tilde{y}, not single like in original SBC, should be sampled for resampling. (line 3)
There are N_theta bootstrap expectation samples, E[\theta | \tilde{y}^] , not N_thetaN_y. i.e. 2000, not 2000500 \mathbb{E}\left[\beta \mid y^{}, x^{*}\right] expectation samples from your slide. I assumed 2000, but from IJ, there are
IJ influence score from data \tilde{y}^, not \tilde{y}. I am confused between IJ1 and IJ2. Also if IJ approximates \theta \sim p(\theta | \tilde{y}^) do we need N_y for loop?
infl_draws_mat are \theta \sim p(\theta | \tilde{y}^*)
If 5 is False, ij_cov gives only variation estimation and post_mean needs to be plugged in. (IJ2)

hyunjimoon commented 3 years ago

This nips paper might be relevant @Dashadower

hyunjimoon commented 3 years ago

- Do fewer draws from the prior and lots of draws from $\tilde{y}$ for each. sensitivity around a particular $\tilde{\theta}$, while keeping $\tilde{y}$ close to its base value.
- instead of drawing a new $\tilde{y}$ from each $\tilde{\theta}$, draw one, draw an MCMC sample for $\tilde{\theta}$, and then bootstrap or do the IJ.

How we could cover the region of the data space originally mapped from hundreds of prior samples is the core question. A new rank measure would be needed as one-to-one correspondence between one prior and M posterior does not hold anymore. This measure should be a set-to-set comparison. For MCMC convergence diagnostics, Rhat, a scale reduction factor, measures the factor by which variance of the posterior could be reduced if chains were continued to run infinitely long. In analog, iterative calibration based on prior and posterior set could work; for instance

[btw prior-posterior] comparing Var(theta) with that of Var(theta').
[btw posterior-posterior] comparing recovered posteriors from subsets of priors
another approach could be to compare IJ with empirical var(theta').

hyunjimoon / SBC

bootstrap approximation #1

references