stan-dev / rstanarm

rstanarm R package for Bayesian applied regression modeling
https://mc-stan.org/rstanarm
GNU General Public License v3.0
389 stars 134 forks source link

add more info for rstanarm::loo display #252

Open avehtari opened 6 years ago

avehtari commented 6 years ago

Summary:

Currently loo object from rstanarm::loo shows elpd_loo, p_loo, looic and SEs for those. Add more model specific and easier to interpret info.

Description:

Example code using earnings data (not really good example for loo with n>>p, but has both bernoulli and normal example)

library("here")
library("arm")
library("rstanarm")
options(mc.cores = parallel::detectCores())
library("ggplot2")
library("bayesplot")
theme_set(bayesplot::theme_default(base_family = "sans"))

#' **Load data**
earnings_all <- read.csv(here("Earnings/data","earnings.csv"))
earnings_all$positive <- earnings_all$earn > 0
# only non-zero earnings
earnings <- earnings_all[earnings_all$positive, ]

#' **Bayesian logistic regression on non-zero earnings**
fit_1a <- stan_glm(positive ~ height + male,
                   family = binomial(link = "logit"),
                   data = earnings_all)
sims_1a <- as.matrix(fit_1a)

loo1a<-loo(fit_1a)
m=sum(fit_1a$y)
n=length(fit_1a$y)
elpd0_pointwise=matrix(0,1,n)
elpd0_pointwise[fit_1a$y]=log((m-1+2/2)/(n-1+2))
elpd0_pointwise[!fit_1a$y]=log((n-m-1+2/2)/(n-1+2))
elpd0=sum(elpd0_pointwise)
sdelpd0=sd(elpd0_pointwise)*sqrt(n)
elpd0diff=sum(loo1a$pointwise[,"elpd_loo"]-elpd0_pointwise)
sdelpd0diff=sd(loo1a$pointwise[,"elpd_loo"]-elpd0_pointwise)*sqrt(n)
sprintf("The baseline by guessing the larger class elpd_loo %.1f sd %.1f",elpd0,sdelpd0)
sprintf("The difference to baseline elpd_diff %.1f sd %.1f",elpd0diff,sdelpd0diff)

sprintf('Baseline classification accuracy %.2f',m/n)
sprintf('Loo  accuracy %.2f',mean(loo1a$pointwise[,"elpd_loo"]>log(1/2)))

#' **Bayesian model on log scale**
fit_1b <- stan_glm(log_earn ~ height + male, data = earnings)
sims_1b <- as.matrix(fit_1b)
print(fit_1a, digits=2)
print(fit_1b, digits=2)

loo1b<-loo(fit_1b)
n=length(fit_1b$y)
elpd0_pointwise=dt((fit_1b$y-mean(fit_1b$y))/(sqrt(1+1/n)*sd(fit_1b$y)), df = n-1, log = TRUE)-log(sqrt(1+1/n)*sd(fit_1b$y))
elpd0_pointwise=sum(dnorm(fit_1b$y,mean=mean(fit_1b$y),sd=sd(fit_1b$y), log = TRUE))
elpd0=sum(elpd0_pointwise)
sdelpd0=sd(elpd0_pointwise)*sqrt(n)
elpd0diff=sum(loo1b$pointwise[,"elpd_loo"]-elpd0_pointwise)
sdelpd0diff=sd(loo1b$pointwise[,"elpd_loo"]-elpd0_pointwise)*sqrt(n)
sprintf("The baseline by guessing the larger class elpd_loo %.1f sd %.1f",elpd0,sdelpd0)
sprintf("The difference to baseline elpd_diff %.1f sd %.1f",elpd0diff,sdelpd0diff)

# PSIS-LOO weights
log_lik=log_lik(fit_1b)
psis=psislw(-log_lik)
# posterior predictive sample
preds <- posterior_linpred(fit_1b)
# LOO predictive mean
predloo=E_loo(preds,psis$lw_smooth)
# LOO Bayesian R2
R2loo=var(predloo)/(var(predloo)+var(fit_1b$y-predloo))
sprintf('LOO residual sd = %.2f, LOO-R-Squared = %.2f',sd(fit_1b$y-predloo),R2loo)
bgoodri commented 6 years ago

We can do this, but we need to do it today / tomorrow to make it into the next rstanarm release. Some questions:

avehtari commented 6 years ago

We can do this, but we need to do it today / tomorrow to make it into the next rstanarm release

That might be too hasty. It's enough that it gets released before Regression book is out.

Good questions.

If the user takes the intercept out of the model, does it still make sense to do a baseline model with only an intercept.

I'm fine with zero intercept in that case.

Similarly, what should we do about stan_clogit for which the intercept is not identified?

If there is no logical baseline, then we can decide not to report this. My thinking is that we just want to show the user if the covariates don't have any predictive power and give some scale for elpd.

I guess we should also have classification accuracy for binomial models?

Maybe, but if N is large I'm not sure if that's easily interpretable. Have you seen that used?

For the "intercept only" models, what do we do about nuisance parameters in the negative binomial, Gamma, inverse Gaussian, etc. cases?

For Gaussian case in the example, I used p(mu,sigma) \propto 1/sigma, although better would be use a conjugate prior close to default rstanarm prior. Conjugate so that we can get quick default. If there is now conjugate prior, I would use MAP estimate. We can even use lm estimates. These should be quite stable for "intercept only" or "zero intercept" models unless n is really small. We can have a threshold, and show NA if n is too small.

jgabry commented 6 years ago

I'm fine with zero intercept in that case

How can the baseline have zero intercept if the only thing in the model is the intercept? I think if the user fits a model without an intercept the baseline that typically makes sense is still y~1

avehtari commented 6 years ago

How can the baseline have zero intercept if the only thing in the model is the intercept? I think if the user fits a model without an intercept the baseline that typically makes sense is still y~1

ok

jgabry commented 6 years ago

Actually now that I think more about it I'm not so sure! We should probably discuss this in a meeting at some point.

andytimm commented 5 years ago

Are you still looking for a hand on this?

I'd be happy to work on it, but would want a little guidance on the things that are unresolved in the discussion above. Here's an attempt at summarizing things since it seems like this issue hasn't been looked at in a bit:

Things that Aki proposed that no one seemed to object to:

  1. For bernoulli add classification accuracy.
  2. For normal add LOO residual sd and LOO-Bayesian-R2 (similar what arm package displays for lm models).

Things that seem to be under discussion:

  1. Do you want classification accuracy for binomial? Ben asked, Aki wasn't sure.
  2. If the user takes an intercept out of the model, what's the baseline? Jonah said y~1 initially, then said he wasn't sure, Aki had suggested intercept 0, which is confusing to me.
  3. For the intercept only models, how to deal with nuisance params? Aki's suggested solutions were conjugate priors (for speed), close to the rstanarm default priors, and MAP and/or lm estimates. (If I understood him correctly).

I can get started with 1/2 without more input. 3/4 just seem like a decision to be made before it'd be implemented. 5 seems the most complicated, and potentially a bit fiddly depending on the final solution chosen, so that'd be where I'd need the most clarification.

avehtari commented 5 years ago

Are you still looking for a hand on this?

Sure!

Do you want classification accuracy for binomial? Ben asked, Aki wasn't sure.

I'm fine adding it, and I can interpret it, but do people use this?

If the user takes an intercept out of the model, what's the baseline? Jonah said y~1 initially, then said he wasn't sure, Aki had suggested intercept 0, which is confusing to me.

I now think y~1 makes sense.

For the intercept only models, how to deal with nuisance params? Aki's suggested solutions were conjugate priors (for speed), close to the rstanarm default priors, and MAP and/or lm estimates. (If I understood him correctly).

Yes.

jgabry commented 5 years ago

@andytimm Thanks for the offer, that'd be great! I think y~1 like Aki said is fine for the baseline. Let us know if you have other questions that would help you get started or questions as you make progress.