Closed MarcRieraDominguez closed 1 year ago
The problem seems to be that the formula gets distorted and its class is call
instead of formula
:
> formula(avg.pscl)
art ~ fem + kid5 + mar | fem + kid5 + mar
> class(.Last.value)
[1] "call"
> formula(avg.tmb)
art ~ 0 + fem + kid5 + mar
> class(.Last.value)
[1] "formula"
In addition, we have
> formula(mod.pscl)
art ~ fem + mar + kid5
which is of class formula
. So where the heck does that divided model formula come from? If I try to hack around it, I get a different error:
> tmp = avg.pscl
> tmp$formula = formula(mod.pscl)
> emmeans(tmp, specs = pairwise ~ mar, adjust = "Tukey", mode = "prob0", data = bioChemists)
Error in count_(Intercept) : could not find function "count_"
> coef(tmp)
count_(Intercept) count_femWomen count_kid5 zero_(Intercept) zero_femWomen
0.88857411 -0.26994856 -0.10071323 1.04466537 -0.33782187
zero_kid5 count_marMarried zero_marMarried
-0.22905995 0.07760844 0.26079442
where it appears that it's confused thinking that count_(Intercept)
is a function. But OTOH, it's not confused with the non-averaged model
> emmeans(mod.pscl, specs = pairwise ~ mar, adjust = "Tukey", mode = "prob0", data = bioChemists)
$emmeans
mar emmean SE df lower.CL upper.CL
Single 0.340 0.0297 907 0.282 0.399
Married 0.284 0.0196 907 0.245 0.322
Results are averaged over the levels of: fem
Confidence level used: 0.95
$contrasts
contrast estimate SE df t.ratio p.value
Single - Married 0.0563 0.0379 907 1.486 0.1376
Results are averaged over the levels of: fem
> coef(mod.pscl)
count_(Intercept) count_femWomen count_marMarried count_kid5 zero_(Intercept)
0.8630115 -0.2664203 0.0789141 -0.1123130 0.9631991
zero_femWomen zero_marMarried zero_kid5
-0.3258336 0.2642771 -0.2696642
At this time, I don't really see a way to anticipate or work around this. It seems like a glitch in model.avg
Thank you for the detailed response! I'll get in touch with the developper of MuMIn, let's see.
The problem was so obvious I couldn't see it. We have two models here, one for the counts and one for the zeros. So we have to have a way of specifying which. So I devised a subset
option (and also a formula
option):
> emmeans(avg.pscl, "mar", formula = art ~ fem + kid5 + mar, subset = "count_", tran = "log", type = "response", data = bioChemists)
mar response SE df lower.CL upper.CL
Single 2.03 0.1054 907 1.83 2.25
Married 2.09 0.0768 907 1.95 2.25
Results are averaged over the levels of: fem
Confidence level used: 0.95
Intervals are back-transformed from the log scale
> emmeans(avg.pscl, "mar", formula = art ~ fem + kid5 + mar, subset = "zero_", tran = "logit", type = "response", data = bioChemists)
mar response SE df lower.CL upper.CL
Single 0.683 0.0285 907 0.625 0.736
Married 0.706 0.0196 907 0.666 0.743
Results are averaged over the levels of: fem
Confidence level used: 0.95
Intervals are back-transformed from the logit scale
Having subset = "cond_"
or subset = "prefix:cond_"
says to use the coefficients prefixed by "cond_"
And the same for avg.tmb
. Here, the coeffiicient names are wrapped in cond()
or zi()
> emmeans(avg.tmb, "mar", subset = "wrap:cond", tran = "log", type = "response", data = bioChemists)
mar rate SE df lower.CL upper.CL
Single 2.04 0.1031 907 1.84 2.25
Married 2.09 0.0756 907 1.95 2.24
Results are averaged over the levels of: fem
Confidence level used: 0.95
Intervals are back-transformed from the log scale
> emmeans(avg.tmb, "mar", subset = "wrap:zi", tran = "logit", type = "response", data = bioChemists)
mar rate SE df lower.CL upper.CL
Single 0.319 0.01867 907 0.283 0.357
Married 0.292 0.00977 907 0.274 0.312
Results are averaged over the levels of: fem
Confidence level used: 0.95
Intervals are back-transformed from the logit scale
Interesting discrepancy with the zero part. One is more or less 1 minus the other. I need to investigate that.
We can also specify subset
to be a named list of indexes of the coefficients to use. I'll eventually push this up.
Please note in all of this, we do not have available any of the options for hurdle
objects, because we are working directly with coefficients and covariances of the coefficients in the averaging
object, not paying attention to what class those models are.
Another approach would be to create a fake hurdle
object with the coefficients and covariances from the averaging
model.
We have to take pains to align the coefficients correctly.
fakeavg.pscl = mod.pscl
bhat = coef(avg.pscl, full = TRUE)
V = vcov(avg.pscl, full = TRUE)
fakeavg.pscl$coefficients = list(
count = bhat[c(1,2,7,3)], zero = bhat[c(4,5,8,6)])
fakeavg.pscl$vcov = V[c(1,2,7,3,4,5,8,6), c(1,2,7,3,4,5,8,6)]
With this we have available all the options for regular hurdle models:
> emmeans(fakeavg.pscl, "mar", mode = "prob0")
mar emmean SE df lower.CL upper.CL
Single 0.318 0.0284 907 0.262 0.373
Married 0.295 0.0197 907 0.257 0.334
Results are averaged over the levels of: fem
Confidence level used: 0.95
> emmeans(fakeavg.pscl, "mar", mode = "count")
mar emmean SE df lower.CL upper.CL
Single 2.05 0.1059 907 1.84 2.25
Married 2.11 0.0753 907 1.96 2.26
Results are averaged over the levels of: fem
Confidence level used: 0.95
> emmeans(fakeavg.pscl, "mar", mode = "response")
mar emmean SE df lower.CL upper.CL
Single 1.61 0.1021 907 1.41 1.81
Married 1.70 0.0682 907 1.57 1.84
Results are averaged over the levels of: fem
Confidence level used: 0.95
It seems to me that this is more fruitful and hardly more tedious. The estimates are slightly off from what I showed in the previous answer because they are averaged after re-gridding. We can get identical results if we insist on averaging on the link scale then summarizing with type = "response"
> emmeans(fakeavg.pscl, "mar", mode = "count", lin.pred = TRUE, type = "response")
mar count SE df lower.CL upper.CL
Single 2.03 0.1054 907 1.83 2.25
Married 2.09 0.0768 907 1.95 2.25
Results are averaged over the levels of: fem
Confidence level used: 0.95
Intervals are back-transformed from the log scale
Thank you very much Russell! Looking forward to these new functionalities in the next release! About the discrepancy between pscl and glmmTMB in the zero part: while pscl estimates the probability of a non-zero, glmmTMB estimates the probability of a 0 (https://github.com/glmmTMB/glmmTMB/issues/470).
I think this is resolved, so am closing
Describe the bug
emmeans won't work with model-averaged hurdle models, fitted through
pscl::hurdle()
. The error isError in update.default(object$formula, . ~ . + 1): need an object with call component
Expected behavior
I would expect to get the estimated marginal means.
emmeans()
will not complain with used on a hurdle model fitted throughglmmTMB
, alhtough the probabilities are not in the 0-1 interval (see reproducible example below).Additional context
Model-averaging was done with
MuMIn::model.avg()
. I get error messages if I don't specifyfit = TRUE
insideMuMIn::model.avg()
, or useemmeans()
without an explicitdata
argument, which is already anticipated by the emmeans documentation. In my research, I would store models in alist
, and uselapply()
to calculate model averaging and emmeans. However, since the error could be reproduced with model outside a list, I've opted to keep it simple, and provide a reproducible example without storing models in lists. Note that while glmmTMB can fit hurdle models, pscl is substantially faster. (besides, glmmTMB binomial component calculates the probability of observing a zero, which I don't find very inutitive).Thanks in advance!
To reproduce
Created on 2023-09-14 with reprex v2.0.2