Closed Generalized closed 11 months ago
It's a complicated situation. First, emmeans()
and relatives require being able to recover the dataset, but that doesn't seem possible without help, at least for this model:
> recover_data(imp_m_gls)
Error in recover_data.call(fcall, trms, object$na.action, data = data, :
argument "data" is missing, with no default
> recover_data(imp_m_gls, data = nhanes)
hyp chl
2 1 187
3 1 187
5 1 113
... (output abbreviated) ...
Correspondingly, we can try:
> ref_grid(imp_m_gls)
Error in ref_grid(imp_m_gls) : We are unable to reconstruct the data.
The variables needed are:
hyp chl
Are any of these actually constants? (specify via 'params = ')
Try re-running with 'data = "<name of dataset>"'
### Then follow up with what is suggested (which you could have tried...)
> ref_grid(imp_m_gls, data = nhanes)
Error in eval(predvars, data, env) : object 'hyp' not found
We still get an error because for gls
models, we also use the data in the emm_basis.gls()
method. There is a mechanism for that in recover_data.gls()
, but for imp_m_gls
, we are using recover_data.mira()
instead.
I think I found a workaround; at least it works with this example:
> emmeans(imp_m_gls, specs = ~hyp+chl, data = nhanes)
hyp chl emmean SE df lower.CL upper.CL
1 192 26.5 1.17 22 24.1 28.9
2 192 26.6 2.40 22 21.7 31.6
Degrees-of-freedom method: df.error
Confidence level used: 0.95
Note you do have to specify the data
argument here.
Thank you very much.
Alas, I cannot reproduce this result:
> imp_m_gls
call :
with.mids(data = nhanes_miss, expr = gls(bmi ~ hyp + chl))
call1 :
mice(data = nhanes, m = 5, method = "pmm", seed = 123)
nmis :
age bmi hyp chl id tim
0 9 8 10 0 0
analyses :
[[1]]
Generalized least squares fit by REML
Model: bmi ~ hyp + chl
Data: NULL
Log-restricted-likelihood: -67.83865
Coefficients:
(Intercept) hyp2 chl
18.27884063 0.90312733 0.03547781
Degrees of freedom: 25 total; 22 residual
Residual standard error: 3.796034
[[2]]
Generalized least squares fit by REML
Model: bmi ~ hyp + chl
Data: NULL
Log-restricted-likelihood: -68.21973
Coefficients:
(Intercept) hyp2 chl
18.49466702 0.80839826 0.03368811
Degrees of freedom: 25 total; 22 residual
Residual standard error: 3.86554
[[3]]
Generalized least squares fit by REML
Model: bmi ~ hyp + chl
Data: NULL
Log-restricted-likelihood: -68.83631
Coefficients:
(Intercept) hyp2 chl
23.3494844 0.3539216 0.0140681
Degrees of freedom: 25 total; 22 residual
Residual standard error: 3.922864
[[4]]
Generalized least squares fit by REML
Model: bmi ~ hyp + chl
Data: NULL
Log-restricted-likelihood: -69.15577
Coefficients:
(Intercept) hyp2 chl
18.32612582 -0.87559308 0.03520585
Degrees of freedom: 25 total; 22 residual
Residual standard error: 4.004991
[[5]]
Generalized least squares fit by REML
Model: bmi ~ hyp + chl
Data: NULL
Log-restricted-likelihood: -71.61168
Coefficients:
(Intercept) hyp2 chl
20.73375882 -2.96109405 0.04188712
Degrees of freedom: 25 total; 22 residual
Residual standard error: 4.506599
and
> pool(imp_m_gls)
Class: mipo m = 5
term m estimate ubar b t dfcom df riv lambda fmi
1 (Intercept) 5 19.8365753 2.081502e+01 4.9135453308 2.671128e+01 22 13.230280 0.2832692 0.2207403 0.3167657
2 hyp2 5 -0.3542480 5.744374e+00 2.6254225741 8.894882e+00 22 9.270615 0.5484509 0.3541933 0.4594540
3 chl 5 0.0320654 6.100493e-04 0.0001111273 7.434021e-04 22 14.651663 0.2185934 0.1793817 0.2723609
> summary(pool(imp_m_gls))
term estimate std.error statistic df p.value
1 (Intercept) 19.8365753 5.1682954 3.8381272 13.230280 0.001992168
2 hyp2 -0.3542480 2.9824288 -0.1187784 9.270615 0.907985021
3 chl 0.0320654 0.0272654 1.1760471 14.651663 0.258331829
then
> emmeans(imp_m_gls, specs = ~hyp+chl, data = nhanes)
Error in eval(predvars, data, env) : object 'hyp' not found
This works:
recover_data(imp_m_gls, data = nhanes)
> recover_data(imp_m_gls, data = nhanes)
hyp chl
2 1 187
3 1 187
5 1 113
7 1 118
8 1 187
9 1 238
13 1 206
14 2 204
17 2 284
18 2 199
19 1 218
22 1 229
23 1 131
25 1 186
I also checked the geeglm and it worked:
> imp_m_gee <- with(nhanes_miss, geepack::geeglm(bmi ~ hyp + chl, corstr = "exchangeable", id = nhanes$id))
> (gee_pooled <- pool(imp_m_gee))
Class: mipo m = 5
term m estimate ubar b t dfcom df riv lambda fmi
1 (Intercept) 5 19.8365753 1.750901e+01 4.9135453308 2.340526e+01 22 12.20835 0.3367555 0.2519200 0.3502976
2 hyp2 5 -0.3542480 3.706792e+00 2.6254225741 6.857299e+00 22 6.93623 0.8499281 0.4594385 0.5682446
3 chl 5 0.0320654 6.292612e-04 0.0001111273 7.626139e-04 22 14.81005 0.2119196 0.1748627 0.2675225
> summary(pool(imp_m_gee))
term estimate std.error statistic df p.value
1 (Intercept) 19.8365753 4.83789860 4.1002462 12.20835 0.00142124
2 hyp2 -0.3542480 2.61864457 -0.1352791 6.93623 0.89623320
3 chl 0.0320654 0.02761547 1.1611391 14.81005 0.26395911
Based on the mice datasets:
> emmeans(imp_m_gee, specs = ~hyp+chl)
hyp chl emmean SE df asymp.LCL asymp.UCL
1 187 25.8 2.03 Inf 21.8 29.8
2 187 25.5 1.56 Inf 22.4 28.5
Covariance estimate used: vbeta
Confidence level used: 0.95
but when the "data" parameter is added, it takes it from the original data:
> emmeans(imp_m_gee, specs = ~hyp+chl, data=nhanes)
hyp chl emmean SE df asymp.LCL asymp.UCL
1 192 26.0 2.08 Inf 21.9 30.1
2 192 25.6 1.47 Inf 22.8 28.5
Covariance estimate used: vbeta
Confidence level used: 0.95
so I'm worrying if the trick with gls() wouldn't result in the same issue?
I see that would be the case. But I don't see how this can be prevented when it is necessary to specifydata
to make it work. You can specify a different dataset with the same variables, but no matter what, it will use the same one dataset. FWIW, if it were able to reconstruct the data, what it would do is create an "average" dataset.
But I would also comment that this can be worked around. Specifying data
affects two things: The levels of factors, and the default average values of covariates. The former is not a problem, The latter can be addressed by specifying at = list(chl = 187)
in the call.
Thinking about this fractionally longer, I suggest that the right thing to do would be to use rbind()
to stack all the imputed datasets into one, and specify that in the data
argument.
I found a way to work with "mmrm", so this might temporarily replace nlme::gls()
As I mentioned elsewhere, mmrm doesn't work with "with()".
> imp_m_mmrm <- with(nhanes_miss, mmrm(bmi ~ hyp + chl + us(tim|id)))
Error in attr(data, "dataname") <- toString(match.call()$data) :
cannot set attribute on a symbol
So the workaround was to make a list of the datasets and iterate through lapply. This, however, gives a list rather than "mira". But this can be casted as "mira".
> dat_mice <- complete(nhanes_miss, "all")
> fit_mmrm <- function(dat) {
mod <- mmrm(bmi ~ hyp + chl + us(tim|id),data=dat)
return(mod)
}
> imp_m_mmrm <- lapply(dat_mice, fit_mmrm)
> summary(pool(imp_m_mmrm))
term estimate std.error statistic df p.value
1 (Intercept) 19.8365753 5.1682953 3.8381273 13.230279 0.001992168
2 hyp2 -0.3542480 2.9824287 -0.1187784 9.270615 0.907985019
3 chl 0.0320654 0.0272654 1.1760471 14.651663 0.258331811
> emmeans(as.mira(imp_m_mmrm), specs = ~hyp+chl)
hyp chl emmean SE df lower.CL upper.CL
1 187 25.8 1.98 22 21.7 29.9
2 187 25.5 2.35 22 20.6 30.3
Confidence level used: 0.95
and now I can also test the contrasts:
# For the original mmrm using only *complete cases* (13 - 3 parameters -> df =10)
> emmeans::contrast(emmeans(m_mmrm, specs = ~hyp+chl), list(some_contrast = c(1, -1)))
contrast estimate SE df t.ratio p.value
some_contrast 0.685 3.4 10 0.202 0.8443
# For the imputed datasets (25 - 3 -> df=22)
> emmeans::contrast(emmeans(as.mira(imp_m_mmrm), specs = ~hyp+chl), list(some_contrast = c(1, -1)))
contrast estimate SE df t.ratio p.value
some_contrast 0.354 2.98 22 0.119 0.9065
I think I will close this issue, as it's not clear that more can be done.
Dear Professor,
In one of the GitHub closed issues I found that emmeans supports the "mice" imputation framework. I also know it supports the gls and mmrm packages.
I just tried to confirm it a simple example. (Please don't mind the nonsensical use of the gls without any correlation or weights. Just wanted to obtain class "gls", rather than meaningful results)
OK, before we start, let's just make sure the emmeans still talks with nlme::gls()
Good! Now, let's run the gls over the imputed datasets and pool the results:
OK, let's use the emmeans:
Hmm, let's chek the same with ordinary lm():
So it works for the lm() but not gls().
Is there any hope to make it working together?