vincentarelbundock / marginaleffects

R package to compute and plot predictions, slopes, marginal means, and comparisons (contrasts, risk ratios, odds, etc.) for over 100 classes of statistical and ML models. Conduct linear and non-linear hypothesis tests, or equivalence tests. Calculate uncertainty estimates using the delta method, bootstrapping, or simulation-based inference
https://marginaleffects.com
Other
410 stars 44 forks source link

Is zero_inflated_beta() in brms supported? #1155

Closed zhengchencai closed 3 weeks ago

zhengchencai commented 3 weeks ago

Hi there,

I wonder if zero_inflated_beta() in brms is supported? I tried avg_predictions() and predictions(), none of them provided 0 value predications. predicted_draws() function from tidybayes gave 0 value predictions. The distribution of the prediction from the former look like Gaussion, whereas the one from predicted_draws() look like beta distributions. Maybe there is a trick to use margnaleffects on zero_inflated_beta model? Thanks a lot.

vincentarelbundock commented 3 weeks ago

Do the brms functions like posterior_epred() work?

It might be useful to see a minimal working example using public data

zhengchencai commented 3 weeks ago

Thanks @vincentarelbundock . I guess the problem is whether the prediction returns the mean of beta or the distribution of beta itself. posterior_epred() and predictions() gives the expectation of beta, whereas posterior_predict() and predicted_draws() return the beta distribution (before taking the expectation). This is a nice explaination https://www.andrewheiss.com/blog/2022/09/26/guide-visualizing-types-posteriors/images/beta.pdf.

Here is an example adapted from https://www.andrewheiss.com/blog/2021/11/08/beta-regression-guide/#zero-inflated-beta-regression-bayesian-style

Is there a way to tell predictions() not taking expectation? Like for logistic regression it will return 0 and 1 values instead of the proportation of 1.

library(tidyverse)        # ggplot, dplyr, %>%, and friends
library(brms)             # Bayesian modeling through Stan
library(tidybayes)        # Manipulate Stan objects in a tidy way
library(broom)            # Convert model objects to data frames
library(vdemdata)         # Use data from the Varieties of Democracy (V-Dem) project
library(ggdist)           # Special geoms for posterior distributions
library(gghalves)         # Special half geoms
library(ggbeeswarm)       # Special distribution-shaped point jittering
library(ggrepel)          # Automatically position labels
library(patchwork)        # Combine ggplot objects
library(scales)           # Format numbers in nice ways
library(marginaleffects)  # Calculate marginal effects for regression models
library(modelsummary)     # Create side-by-side regression tables

set.seed(1234)  # Make everything reproducible

# Define the goodness-of-fit stats to include in modelsummary()
gof_stuff <- tribble(
  ~raw, ~clean, ~fmt,
  "nobs", "N", 0,
  "r.squared", "R²", 3
)

# Custom ggplot theme to make pretty plots
# Get the font at https://fonts.google.com/specimen/Barlow+Semi+Condensed
theme_clean <- function() {
  theme_minimal(base_family = "Barlow Semi Condensed") +
    theme(panel.grid.minor = element_blank(),
          plot.title = element_text(family = "BarlowSemiCondensed-Bold"),
          axis.title = element_text(family = "BarlowSemiCondensed-Medium"),
          strip.text = element_text(family = "BarlowSemiCondensed-Bold",
                                    size = rel(1), hjust = 0),
          strip.background = element_rect(fill = "grey80", color = NA))
}

# Make labels use Barlow by default
update_geom_defaults("label_repel", list(family = "Barlow Semi Condensed"))

# Format things as percentage points
label_pp <- label_number(accuracy = 1, scale = 100, 
                         suffix = " pp.", style_negative = "minus")
label_pp_tiny <- label_number(accuracy = 0.01, scale = 100, 
                              suffix = " pp.", style_negative = "minus")

vdem_clean <- vdem %>% 
  select(country_name, country_text_id, year, region = e_regionpol_6C,
         polyarchy = v2x_polyarchy, corruption = v2x_corr, 
         civil_liberties = v2x_civlib, prop_fem = v2lgfemleg, v2lgqugen) %>% 
  filter(year >= 2010, year < 2020) %>% 
  drop_na(v2lgqugen, prop_fem) %>% 
  mutate(quota = v2lgqugen > 0,
         prop_fem = prop_fem / 100,
         polyarchy = polyarchy * 100)

model_beta_zi <- brm(
  bf(prop_fem ~ quota,
     phi ~ quota,
     zi ~ quota),
  data = vdem_clean,
  family = zero_inflated_beta(),
  chains = 4, iter = 2000, warmup = 1000,
  cores = 4, seed = 1234
)

# predicted_draws from tidybayes
beta_zi_pred <- model_beta_zi %>% 
  predicted_draws(newdata = tibble(quota = c(FALSE, TRUE)))
sum(beta_zi_pred$.prediction[beta_zi_pred$quota == FALSE] == 0)

hist(beta_zi_pred$.prediction[beta_zi_pred$quota == FALSE], breaks = 100)

# posterior_predict
beta_zi_pred <- model_beta_zi %>% 
  posterior_predict(newdata = tibble(quota = c(FALSE, TRUE))) 

beta_zi_pred <- beta_zi_pred[,1] 
sum(beta_zi_pred == 0)

hist(beta_zi_pred, breaks = 100)

# posterior_epred
beta_zi_pred <- model_beta_zi %>% 
  posterior_epred(newdata = tibble(quota = c(FALSE, TRUE))) 

beta_zi_pred <- beta_zi_pred[,1] 
sum(beta_zi_pred == 0)

hist(beta_zi_pred, breaks = 100)

# predictions
beta_zi_pred <- model_beta_zi %>% 
  predictions(newdata = tibble(quota = c(FALSE, TRUE))) %>% 
  posterior_draws()
sum(beta_zi_pred$draw == 0)
hist(beta_zi_pred$draw[beta_zi_pred$quota == FALSE], breaks = 100)
vincentarelbundock commented 3 weeks ago

Have you tried changing the type argument? This is documented at the very top of the Bayesian vignette:

https://marginaleffects.com/vignettes/brms.html

zhengchencai commented 3 weeks ago

Oh yes type = "prediction" works. Thank you very much.