Open rpsychologist opened 5 years ago
I like the idea a lot and it wouldn't actually be too complicated to change the internals.
However, we can't just change the existing families because this would be a huge backwards compatibility break.
One option would be to introduce a new set of families, named for instance hurdle_gamma2
etc. that implements the mean parameterization. What would you think of that option?
Adding new families might make it clear that they are separate but related models with different estimands, as compared to doing hurdle_gamma(marginal = TRUE)
.
A potential problem with Smith et al specification posted above is that Y | Y > 0 can sometimes be predicted to be unreasonably large for participants with a very low probability of Y > 0, e.g., someone who only report zeros. Maybe there's a smart way to improve the parameterization to avoid this.
What's the reason behind these unrealistic predictions? Does the standard parameterization not tend to make such predictions (not a rethorical question)?
It happens when Pr(Y = 0) is estimated to be really high. So if the overall mean is small say $2, we can get extreme expected values for the gamma dist since we solve for the conditional positive value, e.g. 2/plogis(-20) = 970 330 393. Even if we estimate this value as highly unlikely, and it might have little effect on the estimates for the overall effect, this behavior still feels undesirable. The standard model is less affected by problems with very large hurdle probs, as the gamma params are not directly solved from the hurdle probs and the prior is on the conditional part.
I see, so it is good in any case to have both paramaterizations of zero-inflated or hurdle models.
Absolutely, both are highly useful.
I think adding "marginal" versions of the current hurdle models supported in brms would be extremely useful. Where "marginal" means that we model the effect of covariates on the overall outcome E(Y | b) instead of the conditionally positive part E(Y | Y > 0, b). This can be done simply by solving for the conditional mean as a function of the overall mean (cf., Smith et al https://doi.org/10.1177/0962280215592908). Here's a hackish example where the
hurdle_gamma
lpdf is changed to represent the marginal version using the custom_family function.As you can see the only real difference is solving for the conditionally positive mean using
exp(-(mu - log1m_inv_logit(hu)));
Although the models are highly similar, I'm not sure if this is an easy change with regards to brms internal code.