paul-buerkner / brms

brms R package for Bayesian generalized multivariate non-linear multilevel models using Stan
https://paul-buerkner.github.io/brms/
GNU General Public License v2.0
1.27k stars 183 forks source link

feature suggestion: posterior probability of being in the zero-inflation state #1551

Open jsocolar opened 11 months ago

jsocolar commented 11 months ago

I wrote the following function to predict the posterior probability of being in the zero-inflation state for the discrete zero-inflated families in brms. If of interest, I'll submit a PR with doc + tests.

posterior_zi_prob <- function (brmsfit, ...) {
  disc_zi_families <- c(
    "zero_inflated_poisson", "zero_inflated_negbinomial",
    "zero_inflated_binomial", "zero_inflated_beta_binomial"
  )

  fam <- brmsfit$family$family

  if(!(fam %in% disc_zi_families)) {
    stop2(paste0(
      "zi_probs available only for discrete zero-inflated families (",
      paste(disc_zi_families, collapse = ", "),
      "). Supplied brmsfit has family: ",
      fam
    ))
  }

  lik <- exp(log_lik(brmsfit, ...))
  zi_lik_given_obs_zero <- posterior_linpred(
    brmsfit, 
    dpar = "zi",
    transform = TRUE,
    ...
  )

  resp_name <- brmsfit$formula$resp
  resp_data <- brmsfit$data[[resp_name]]
  resp_is_zero <- as.integer(resp_data == 0)

  zi_lik <- sweep(zi_lik_given_obs_zero, MARGIN=2, resp_is_zero, `*`)

  zi_lik / lik
}
paul-buerkner commented 11 months ago

I am a bit confused. What is the difference of your suggestion to just predicting zi directly via posterior_epred(..., dpar = "zi")? After all zi is by definition the probability of being in the zero-inflation arm.

jsocolar commented 11 months ago

This refers to a situation where we care about the posterior probability that a particular observation arises from the latent zero-inflation state.

This probability of being in the zero-inflation state is zero given an observed nonzero, and is higher than zi given an observed zero.

paul-buerkner commented 11 months ago

I don't fully understand the when this is needed. So posterior_epred(dpar = "zi") give the posterior predicted probability for any (new) observation given predictors. You make it sound as if we now additionally condition the predcictions on (new?) observed data. Or am I misunderstanding something.

jsocolar commented 11 months ago

You're absolutely correct that we're conditioning on the observed data (not necessarily new data; the commoner use case will be the data used in model fitting). I think what you're missing is why this can be useful :)

Sometimes with these zero-inflated models, we actually want inference, row-wise, on which observations came from the zero-inflation process. In other words, we want to "un-marginalize" over the latent discrete state $Z$, where $Z = 1$ says that the observation arises from zero inflation and $Z = 0$ says it does not. (Throughout this comment, I suppress indexing on the observation id (i.e. the row); $Z$ is a vector over the data rows). If we were sampling the latent discrete parameter we'd write Z ~ bernoulli(zi).

A good concrete example is an occupancy model fitted as a zero-inflated binomial or zero-inflated beta-binomial (but I don't think the use cases for this are restricted to occupancy modeling). A common goal in inference from an occupancy model is to get a posterior distribution for occupancy across the actual study sites (usually not new data, but the actual sites used in fitting). That is, we want a posterior distribution, per site, of what the probability is that that side is occupied, conditioning on the observed data.

Let $L_0$ be the likelihood of the observed outcome via $Z = 0$, and $L_1$ be the likelihood via $Z = 1$. The overall likelihood is $L_0 + L_1$, which is how we marginalize out $Z$. The posterior probability $p(Z == 1) = \frac{L_1}{L_0 + L_1}$. The easiest intuitive foothold here is that when we observe a nonzero outcome, $L_1$ is zero, and as expected we are a posteriori certain that $Z = 0$. Note also that when we observe a zero outcome, the posterior $Z$ state is not simply 1 with probability zi. Note that $L_1$ is simply zi, and so the posterior probability of $Z = 1$ is higher than zi (because $L_0 + L_1$ is always less than one; note that the former contains a 1 - zi term and the latter is zi).

Is that clarifying?

In an occupancy modeling context, this is useful in case we want to report (posterior distributions for) descriptive statistics of the set of occupied sites, or in general compute any derived quantity based on $Z$. This turns out to be quite common. Likewise, I'm pretty sure that there are applications of zero-inflated Poisson and zero-inflated negative binomial models where inference on $Z$ itself is desired.

Edit: a final note is that when we desire inference about the posterior distribution for $Z$, it's still better to report out probabilities rather than bernoulli samples from those probabilities (which is what the unmarginalized model would do), because there's some loss of information when sampling, and it's always possible to sample later.

jsocolar commented 11 months ago

Just a note on naming: I guess posterior_zi_prob() is confusing and not well differentiated from posterior_linpred(..., dpar = "zi", transform = T) and posterior_epred(..., dpar = "zi"). Maybe something like latent_zi_state_prob() would be better.

paul-buerkner commented 11 months ago

Thank you for explaining. So this is basically getting the conditional mixture weights per observation? If so, perhaps we can generalize pp_mixture to not only work with formal mixture models (created via mixture) but also with de-facto mixture models such as zero-inflated and hurdle models? This would, I think, lead to a more consistent interface.

jsocolar commented 11 months ago

Sounds good. Note that in hurdle models the mixture weights are always either 0 or 1 and depend only on data, with no dependency on parameters. Do you think it's worthwhile to enable pp_mixture to compute the trivial mixture weights for hurdle models? For zero-inflated models, only a subset of the weights are trivial.

paul-buerkner commented 11 months ago

Yeah, that make sense. For consistency, it could still be nice to have the hurdle models still even though the answer is trivial.

jsocolar commented 11 months ago

Edge case: what is the desired return of pp_mixture if a zero-inflated or hurdle (or zero-one-inflated) model is included as a mixture component in an explicit mixture model, e.g.

set.seed(1)
dat <- data.frame(
  y = runif(100)
)
mix2 <- mixture(gaussian, zero_one_inflated_beta)
fit2 <- brm(bf(y ~ 1), dat2, family = mix2) 

I tried checking if the behavior is already defined for nested mixtures in brms, but it appears not to be, as this errors:

mix3 <- mixture(mixture(gaussian, zero_one_inflated_beta), lognormal)
fit3 <- brm(bf(y~1), dat2, family = mix3)

My instinct is not to return the zero-inflation mixture weights at all and just return the weights of the explicitly defined (via mixture()) mixture components, and I note that any other choice would substantially change the existing behavior of pp_mixture(fit3), rather than merely introduce new functionality.

paul-buerkner commented 10 months ago

I think we should have an argument that needs to be turned on explicitely for zi/hu models to be used directly within pp_mixture. This way we ensure that the functions behavior does not change too much just based on the input without explicit user intervention.