stan-dev / rstanarm

rstanarm R package for Bayesian applied regression modeling
https://mc-stan.org/rstanarm
GNU General Public License v3.0
387 stars 132 forks source link

pp_check, posterior_predict and stan_betareg with loglog link #486

Closed LeoSokolovic closed 3 years ago

LeoSokolovic commented 3 years ago

Summary:

posterior_predict, pp_check of a stan_betareg model with "loglog" link do not produce predictions on the measurement scale.

Description:

Can it be that the exp(-exp(predictions)) is not calculated automatically when using the pp_check and posterior_predict functions?

RStanARM Version:

The version of the rstanarm: 2.21.1

R Version:

The version of R: 4.0.3

Operating System:

macOS Catalina 10.15.7

jgabry commented 3 years ago

Hi @LeoSokolovic thanks for reporting this. Can you share an example that we can use to reproduce the problem? Thanks!

LeoSokolovic commented 3 years ago

Hi!

Sure, I didn't have time to create one before now, sorry. As my question above implies, I'm not really sure, if I found a "bug", but this code tries to show the discrepancy between the method with posterior predicitons changed by the inverse loglog function and those without.

Best, Leo

library(rstanarm)
library(bayesplot)
data("GasolineYield", package = "betareg")

# Define the inverse loglog function
invLL <- function(x) exp(-exp(x))

#fit a model from the betareg paper using the loglog link
model <- stan_betareg(yield ~ batch + temp, data = GasolineYield,link = "loglog",cores=7)

# plot the posterior predictions
pp_check(model)

#generate posterior predictions (not) using the invLL function
posteriorPredictionsNoLL <- posterior_predict(model)
posteriorPredictionsWithLL <- posterior_predict(model,fun = invLL)

#set up plotting
opar <- par(no.readonly = T)
par(mfrow=c(1,3))

#plot some histograms
hist(apply(posteriorPredictionsNoLL,2,mean),xlim=c(0,1),
main="Predictions without the inverse \nloglog transformation")
hist(model$y,xlim=c(0,1),
main = "Density of the dependent variable in the model")
hist(apply(posteriorPredictionsWithLL,2,mean),xlim=c(0,1),
main="Predictions with the inverse \nloglog transformation")
jgabry commented 3 years ago

Thanks a lot for the example @LeoSokolovic. What's strange is that I think internally the code doesn't care which link function is used it just applies the inverse link, so I don't know how it could not work with loglog but work fine with other links, like the default logit (and it does seem to work correctly with logit).

One thing I tried is usually bayesplot functions directly to make the same plot that pp_check makes using both your posteriorPredictionsNoLL and posteriorPredictionsWithLL. In both cases it looks like a bad fit to the data:

bayesplot::ppc_dens_overlay(model$y, posteriorPredictionsNoLL[1:100,])

bayesplot::ppc_dens_overlay(model$y, posteriorPredictionsWithLL[1:100,])

@LeoSokolovic If you make those plots do you see what I mean? Is it possible that it's applying the inverse link correction but using "loglog" is just a bad model for this data? I think it might already be doing invLL internally and when you are adding the invLL it could be doing it a second time. But I could definitely be wrong. Let me know what you think and if necessary we will definitely look into this further.

LeoSokolovic commented 3 years ago

Hi @jgabry ,

first of all, I think that the invLL should be exp(-exp(-x)). After also trying the probit link function it does really seem to be so that the loglog link is a poor choice - both logit and probit lead to more plausible predictions. However, I used the toy example from the betareg paper because the authors also used the loglog link to show that the it is able to accomodate the variable dispersion in the data. So I assumed that the loglog link should be better than the logit.

In any case, applying the invLL does seem to lead to a double inverse transformation.

Thank you so much for your help!

Best regards, Leo