Closed MarcRieraDominguez closed 1 year ago
It's easy indeed to get confused by all of this. But let's look at where we can get comparable results: One is by looking only at the count component:
> emmeans(pscl.hurdle, ~ mar, mode = "count", lin.pred = TRUE, type = "response")
mar count SE df lower.CL upper.CL
Single 2.07 0.1122 909 1.86 2.31
Married 2.09 0.0807 909 1.94 2.26
Results are averaged over the levels of: fem
Confidence level used: 0.95
Intervals are back-transformed from the log scale
> emmeans(glmmtmb.hurdle, ~ mar, comp = "cond", type = "response")
mar rate SE df asymp.LCL asymp.UCL
Single 2.07 0.1122 Inf 1.86 2.30
Married 2.09 0.0807 Inf 1.94 2.26
Results are averaged over the levels of: fem
Confidence level used: 0.95
Intervals are back-transformed from the log scale
... or at just the zero component:
> emmeans(pscl.hurdle, ~ mar, mode = "prob0")
mar emmean SE df lower.CL upper.CL
Single 0.313 0.0265 909 0.261 0.365
Married 0.297 0.0191 909 0.259 0.334
Results are averaged over the levels of: fem
Confidence level used: 0.95
> emmeans(glmmtmb.hurdle, ~ mar, comp = "zi", type = "response")
mar response SE df asymp.LCL asymp.UCL
Single 0.313 0.0267 Inf 0.263 0.367
Married 0.296 0.0190 Inf 0.261 0.335
Results are averaged over the levels of: fem
Confidence level used: 0.95
Intervals are back-transformed from the logit scale
... or by combining the two components into the overall response mean:
> emmeans(pscl.hurdle, ~ mar, mode = "response")
mar emmean SE df lower.CL upper.CL
Single 1.64 0.0900 909 1.47 1.82
Married 1.69 0.0633 909 1.57 1.82
Results are averaged over the levels of: fem
Confidence level used: 0.95
> emmeans(glmmtmb.hurdle, ~ mar, comp = "response")
mar emmean SE df asymp.LCL asymp.UCL
Single 1.64 0.0900 Inf 1.47 1.82
Married 1.69 0.0633 Inf 1.57 1.82
Results are averaged over the levels of: fem
Confidence level used: 0.95
So I hope this will give you more faith that there is some measure of consistency.
The main thing to know is that results comparable to comp = "cmean"
in the glmmTMB support is simply not available for models fitted via pscl::hurdle()
. You can compute the estimates (but not their SEs) manually via the count mean divided by (1 - P(0)) based on that same count distribution (not the zi part):
> untrunc = predict(emmeans(glmmtmb.hurdle, ~ mar, comp = "cond", type = "response"))
> untrunc
[1] 2.072832 2.091283
> p0 = exp(-untrunc)
> untrunc / (1 - p0)
[1] 2.371197 2.386025
Finally, your result when you re-gridded is slightly different than without because with the regridding, the results are converted to the response scale before averaging over fem
.
Actually, the above manual calculations differ slightly from what you got with comp = "cmean"
for exactly the opposite reason. In the glmmTMB support, the cmean
results were computed for all combinations of mar
and fem
, then averaged over fem
:
> untrunc = predict(emmeans(glmmtmb.hurdle, ~ mar*fem, comp = "cond", type = "response"))
> (trunc = matrix(untrunc / (1 - exp(-untrunc)), nrow = 2))
[,1] [,2]
[1,] 2.584455 2.188083
[2,] 2.601720 2.200814
> apply(.Last.value, 1, mean)
[1] 2.386269 2.401267
Thank you for the response, my faith in consistency has been restored :)
Hi! Congratulations on the great package!
My goal is to interpret the two components of a hurdle model: the binomial hurdle (probability of observation > 0 ) and the truncated count, at the untrasformed scale (probabilities rather than logits, etc).
Given the complexity of predicting from such models, I compared results from
pscl::hurdle()
toglmmTMB()
to make sure I was getting the desired result. Unfortunately, the model coefficient's differed slightly between packages (5th decimal position), which makes direct comparison a bit more tricky. However, the estimated means were quite different, more than coefficient difference would allow (I think). (see below for a reproducible example).To get truncated counts, I thought the appropiate code would be
emmeans(pscl::hurdle(), mode = "count")
, based om the emmeans documentation, while glmmTMB documentation indicated that an appropiate code would be:emmeans(glmmTMB::glmmTMB(), mode = "cmean")
.However,
emmeans(pscl::hurdle(), mode = "count")
was not similar toemmeans(glmmTMB::glmmTMB(), mode = "cmean")
. Instead, it was similar to:emmeans(glmmTMB::glmmTMB(), component = "cond", type = "response")
, which the glmmTMB documentation indicates that yields untruncated counts (page 7). Usingemmeans(glmmTMB::glmmTMB(), component = "cond", regrid = "response")
gave the same results asemmeans(pscl::hurdle(), mode = "count")
.Regarding the obtention of binomial probabilities, I did not find guidance in package documentation. Apologies if I have misread the documentation! (this question is in fact an expansion on a previous question I asked over at Stack Overflow)
After my exploration, my question is: Is the following code appropiate to obtain estimated means at the scale of the response, when fitting hurdle models thourgh the
pscl
package?emmeans(pscl.hurdle, mode = "count")
emmeans::emmeans(pscl.hurdle, mode = "prob0", regrid = "response")
As a side note, while the binomial and truncated count models could be fitted separately, I prefer to fit a hurdle model since it provides a pseudo-r2 for the two processes combined (since it produces residuals etc), and it is more compact code-wise than fitting two models.
See below a reproducible example of the behaviour of the different functions, let me know if you need further information. Many thanks for all the work you put in the package!
Created on 2023-09-13 with reprex v2.0.2