Open bwiernik opened 3 years ago
Hi Brenton. Not sure if you have seen the simulate.rma()
method. Is this what you are trying to do via your function?
library(metafor)
dat <- escalc(measure="RR", ai=tpos, bi=tneg, ci=cpos, di=cneg, data=dat.bcg)
res <- rma(yi, vi, data=dat)
res
sim <- simulate(res, nsim=1000)
sav <- lapply(sim, function(x) rma(x, vi, data=dat))
sapply(sav, function(x) x$beta)
t(sapply(sav, function(x) c(x$beta, x$tau2)))
As far as I can tell, the simulate.rma()
function doesn't incorporate any uncertainty in either the mean function or the tau2 estimate, similar to the stats::simulate.lm()
function?
Correct, it just simulates new data based on the estimated fixed effects and the marginal var-cov matrix of the estimates. Can you point me to a paper describing the theory as to what you are trying to accomplish?
To add to this: I took a look at getMethod("sim", "lm")
from arm
. I see what they are doing there. But doing something similar is going to be very difficult for meta-analytic models. In simple regression models, the distribution of sigma^2 is a scaled chi-square distribution but this doesn't apply to tau^2 in random-effects models. For iterative estimators, such as REML, the exact distribution is not even known.
This is the original paper on posterior predictive checks: http://www.stat.columbia.edu/~gelman/research/published/A6n41.pdf See also https://cran.r-project.org/web/packages/bayesplot/vignettes/graphical-ppcs.html and Andrew Gelman and Jennifer Hill. (2006). Data Analysis Using Regression and Multilevel/Hierarchical Models. Cambridge University Press.
So, the function I linked to does two things:
arm::sim()
using the sampling distributions of beta and tau2. predict()
). Compared to simulate()
or predict()
, each row of the sample matrix contains its own mean computed from a draw of the simulated betas and its own SD computed from a draw of the simulated tau2s and the sample vi. The two main applications here are:
If you know the distribution of the parameter estimates, then you can indeed use this information to directly sample them. However, in random-effects models, this is not the case (except for some very special cases). Once you move to rma.mv()
models, it gets even more tricky.
However, if you simulate new data from the estimated parameters and then refit the same model to each simulated dataset, you should be accomplishing the same thing or something close to it. The former is of course faster, but the latter is completely general as it does not require that we figure out for every possible model what the distribution of the parameter estimates is.
I examined this with an lm()
model:
library(arm)
library(RcppEigen)
res <- lm(mpg ~ cyl + hp + wt, data = mtcars)
sim1 <- sim(res, n.sims = 1000000)
X <- model.matrix(res)
tmp <- simulate(res, nsim = 1000000)
sav <- lapply(tmp, function(y) fastLmPure(X, y)) # much faster than lm()
sim2 <- data.frame(t(sapply(sav, coef)))
sim2$sigma <- sapply(sav, function(x) x$s)
par(mfrow=c(5,1))
for (i in 1:4) {
plot(ecdf(sim1@coef[,i]), col="red")
plot(ecdf(sim2[,i]), add=TRUE, col="blue")
}
plot(ecdf(sim1@sigma), col="red")
plot(ecdf(sim2$sigma), add=TRUE, col="blue")
summary(sim1@sigma)
summary(sim2$sigma)
sd(sim1@sigma)
sd(sim2$sigma)
For the fixed effects, the distributions appear to be essentially the same. For sigma, they are not, but they are similar. I don't know the theory that would say under what conditions these two approaches converge to the same distributions (maybe at least asymptotically?). But based on this, I would be hesitant to embark on the attempt to do something like arm::sim()
for meta-analytic models, when the machinery to do something analogous is in essence already available (and again, right now we don't even have the theory to implement something like arm::sim()
for random-effects models anyway).
Classification:
Feature Request
Summary
I'd like to be able to simulate posterior draws and posterior predictive draws ala
arm::sim()
andbayesplot::pp_check()
. I wrote a quick and dirty function to draw coefficients/tau or yi samples assuming that the REML and knha is used. I can generalize it, but I'd like to verify you'd be interested and get your input on API and coding style.https://gist.github.com/bwiernik/be04c8817046ea0ffea9e9838928ea96