stan-dev / bayesplot

bayesplot R package for plotting Bayesian models
https://mc-stan.org/bayesplot
GNU General Public License v3.0
433 stars 83 forks source link

Useful plots for MRP #208

Open lauken13 opened 5 years ago

lauken13 commented 5 years ago

@jgabry and I were talking that it might be handy to have extra functionality in bayesplot to do MRP style plots.

Currently you can use BayesPlot with posterior_predict objects provided the observed and newdata are of the same size. This is useful to be able to compare observed versus predicted (when using posterior_predict) to do model diagnostics.

For MRP it would be useful to increase this functionality where newdata of a different size to the fitted data, and also to include functionality to add a weighted estimate (if weights are available), a raw sample estimate and the full posterior predictive estimate.

I know @bbbales2 has some great plots for MRP, so he might be able to weigh in. I'll follow up with an example to show what I envision it would look like. @mitzimorris and @andrewgelman might be interested too.

andrewgelman commented 5 years ago

+1

On Nov 15, 2019, at 12:45 PM, Lauren Kennedy notifications@github.com wrote:

@jgabry https://github.com/jgabry and I were talking that it might be handy to have extra functionality in bayesplot to do MRP style plots.

Currently you can use BayesPlot with posterior_predict objects provided the observed and newdata are of the same size. This is useful to be able to compare observed versus predicted (when using posterior_predict) to do model diagnostics.

For MRP it would be useful to increase this functionality where newdata of a different size to the fitted data, and also to include functionality to add a weighted estimate (if weights are available), a raw sample estimate and the full posterior predictive estimate.

I know @bbbales2 https://github.com/bbbales2 has some great plots for MRP, so he might be able to weigh in. I'll follow up with an example to show what I envision it would look like.@mitzimorris https://github.com/mitzimorris and @andrewgelman https://github.com/andrewgelman might be interested too.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/stan-dev/bayesplot/issues/208?email_source=notifications&email_token=AAZYCUOFPOWW3XXZCZGKAN3QT3N5FA5CNFSM4JN6F2S2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HZVSGKA, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZYCUIRGSIM2WGORZZZ2KDQT3N5FANCNFSM4JN6F2SQ.

lauken13 commented 4 years ago

image Great - so this is a rough idea of what I'm thinking - essentially posterior linpred for the sample, for the population, with a raw sample mean and an (optional) wtd sample mean as horizontal lines. I probably wouldn't do dodged histograms, but you get the idea.

I think we could do this pretty easily by calling on existing functions in Bayesplot and layering them to get the popn vs sample effect. As soon as I have demo code for that I"ll move this to a branch.

Reproducible code for fig below (most of this code is to simulate the data, and I just made fake weights).

`library(DeclareDesign) library(bayesplot) library(dplyr) library(arm) library(rstanarm) library(ggplot2)

create population

N = 5000 design <- declare_population( N = 5000, X1 = sample.int(3,size = N,replace = TRUE,prob = sample.int(10,3)), X2 = sample.int(3,size = N,replace = TRUE,prob = sample.int(10,3)), X3 = sample.int(3,size = N,replace = TRUE,prob = sample.int(10,3)), X4 = sample.int(3,size = N,replace = TRUE,prob = sample.int(10,3)), Y = invlogit(rnorm(3,0,.3)[X1]+ rnorm(3,0,.3)[X2]+ rnorm(3,0,.3)[X3]+ rnorm(3,0,.3)[X4]), y = rbinom(N,1,Y) ) popn <- draw_data(design + declare_sampling(n = 5000)) popn_ps <- popn %>% group_by(X1,X2,X3,X4) %>% summarize(N = n(),y_count = sum(y))%>% ungroup()

popn_ps<- data.frame(popn_ps)

sampling_function <- function(data, n) { p = invlogit(rnorm(3,0,.3)[data$X1]+ rnorm(3,0,.3)[data$X2]+ rnorm(3,0,.3)[data$X3]+ rnorm(3,0,.3)[data$X4]) data[sample.int(nrow(data), n, replace = TRUE, prob = p), , drop = FALSE] }

samp_dat <- draw_data(design + declare_sampling(handler = sampling_function, n = 500)) samp_wts <- sample.int(4,nrow(samp_dat),replace = TRUE)

m1 <- stan_glmer(y ~ (1|X1) + (1|X2) + (1|X3) + (1|X4), family = binomial(link="logit"), data = samp_dat)

samp_ppc <- posterior_linpred(m1, transform = TRUE) popn_ppc <- posterior_linpred(m1, newdata=popn_ps, transform = TRUE)

samp_p_est <- apply(samp_ppc,1,function(x) sum(x)/length(x)) popn_p_est <- apply(popn_ppc,1,function(x) sum(x*popn_ps$N)/sum(popn_ps$N))

df <- data.frame(posterior = c(samp_p_est,popn_p_est), est = rep(c('sample','popn'),each=4000)) samp_mean <- mean(samp_dat$y) wtd_samp_mean <- sum(samp_wts*samp_dat$y)/sum(samp_wts)

ggplot(df, aes(x=posterior,group=est,fill=est, y = ..density..))+geom_histogram(position = "dodge")+ geom_vline(xintercept = samp_mean) + geom_vline(xintercept = wtd_samp_mean, linetype="dashed") `