paul-buerkner / brms

brms R package for Bayesian generalized multivariate non-linear multilevel models using Stan
https://paul-buerkner.github.io/brms/
GNU General Public License v2.0
1.26k stars 177 forks source link

predict.brmsfit with imputed missing values #544

Open bryorsnef opened 5 years ago

bryorsnef commented 5 years ago

Currently, predict.brmsfit returns an error when predictions are requested for all responses, and the new data contains NAs in variables that were imputed during fitting. A partial solution to this, as per https://discourse.mc-stan.org/t/predict-brms-in-multivariate-model-with-imputation/6334 is to update the predict function to check if the "missing value graph” is acyclic, and if so, and impute values for the NAs before predicting other responses.

MWE:

y <- rnorm(100)
x <- ifelse(sample(c(0,1),size=100, replace = T, prob = c(.2,.8))==0, NA, rnorm(1))
z <- rnorm(100)

dat <- data.frame(x, y, z)

form1 <- bf(x | mi() ~ z)
form2 <- bf(y ~ mi(x))

mod <- brm(form1 + form2, data = dat)

newdat <- data.frame(x = ifelse(sample(c(0,1),size=100, replace = T, 
                     prob = c(.2,.8))==0, NA, rnorm(1)), z = rnorm(100))

predict(mod, newdata = newdat)
#There were 19 warnings (use warnings() to see them)
paul-buerkner commented 5 years ago

Thanks for opening this issue!

Please note that the "missing value graph" is my own ad-hoc wording so you probably won't find this term anywhere else.

mts24 commented 5 years ago

I am hoping to run an similar analysis, but before I begin to sequentially predict my multilevel hierarchical models I wondered if there were any additional updates to this.

Thanks

brianhuey commented 4 years ago

This feature would be really helpful. If we know that our imputation process is acyclic, is there a way to manually run prediction with imputed values?

paul-buerkner commented 4 years ago

A workaround is described in https://discourse.mc-stan.org/t/predict-brms-in-multivariate-model-with-imputation/6334

brianhuey commented 4 years ago

Thanks Paul, In the example above, the workaround would be to use the point estimate of x in form1 to fill in any cases where x is NA in newdat? When the feature is fully built out would it sample x from the posterior and pass those values to mi(x) in form2?

paul-buerkner commented 4 years ago

Yes. That is the plan. But I dont have an idea yet how to realize this internally.

S. Brian Huey notifications@github.com schrieb am Mo., 20. Apr. 2020, 22:13:

Thanks Paul, In the example above, the workaround would be to use the point estimate of x in form1 to fill in any cases where x is NA in newdat? When the feature is fully built out would it sample x from the posterior and pass those values to mi(x) in form2?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/paul-buerkner/brms/issues/544#issuecomment-616783029, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADCW2AE5FUSVVPCQ5NYT233RNSUF7ANCNFSM4GB5MPTA .

wds15 commented 2 years ago

As I just stumbled on this in need for my project... here is a code piece demonstrating how it could be done. Not nice, not really fast, but it works... it would be really cool to have such a feature. Maybe one can add an option to brms predict functions (or a flag to the formulas), which say "my stuff is acyclic... I promise". The order of evaluation is last to first equation as specified in the bf1 + bf2 + ... syntax??

Here is only for ref a code chunk doing what I want:

library(brms)
library(dplyr)
library(tidyr)
library(future)

plan(multisession, workers=4)

data("nhanes", package = "mice")
N <- nrow(nhanes)

# simulate some measurement noise
nhanes$se <- rexp(N, 2)

# measurement noise can be handled within 'mi' terms
# with or without the presence of missing values
bform2 <- bf(bmi | mi() ~ age * mi(chl)) +
  bf(chl | mi(se) ~ age) +
  set_rescor(FALSE)

fit2 <- brm(bform2, data = nhanes, chains=1)

num_sim <- 10
pcovs <- posterior_linpred(fit2, newdata=nhanes, draw_ids=1:num_sim)

ppf <- lapply(seq_len(num_sim), function(id) {
    future({
        dd <- mutate(nhanes, chl=if_else(is.na(chl), pcovs[id,,"chl"], chl))
        posterior_linpred(fit2, newdata=dd, draw_ids=id, resp="bmi")
    }, seed=TRUE)
})

pp <- t(sapply(ppf, value))