Closed Generalized closed 1 day ago
It seems that qrdg() struggles wih ordinal factors: In the example it did not take the trends, but rather levels:
In the model:
> mmrm_gee_imp_trend_grid@grid
Visit Arm Age_centered .wgt.
1 Month 6 HD -2.433632e-15 56
2 Month 12 HD -2.433632e-15 56
3 Month 20 HD -2.433632e-15 56
4 Month 6 SoC -2.433632e-15 55
5 Month 12 SoC -2.433632e-15 55
6 Month 20 SoC -2.433632e-15 55
> mmrm_gee_imp_trend_grid@bhat
(Intercept) VisitMonth 12 VisitMonth 20 ArmSoC
NA NA NA -0.40627726
Age_centered VisitMonth 12:ArmSoC VisitMonth 20:ArmSoC VisitMonth 12:Age_centered
0.03240021 NA NA NA
VisitMonth 20:Age_centered
NA
So I guess I should tell somehow qdrg() that Visit is ordinal. But how?
I have no clue, as I don't know this package nor what code you used. You have a bunch of issues open and I'm getting them mixed up as I am traveling. This one does need a reproducible example.
I understand. Yes, doing it via email will be difficult, I will wait and we can continue when you return and have the access to github. I provided a fully reproducible example with lots of details and attached data in the post (ZIP files). The 2 other topics don't need any examples, they are not about any code.
Please provide a reproducible example that does not require us to download external data. I am sorry this will take time, but so does package maintenance and answering issues. Here are some guidelines.
The raw data:
d <- structure(list(ID = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L,
3L, 4L, 4L, 4L, 5L, 5L, 5L, 6L, 6L, 6L, 7L, 7L, 7L, 8L, 8L, 8L,
9L, 9L, 9L, 10L, 10L, 10L, 11L, 11L, 11L, 12L, 12L, 12L, 13L,
13L, 13L, 14L, 14L, 14L), levels = c("1", "2", "3", "4", "5",
"6", "7", "8", "9", "10", "11", "12", "13", "14"), class = "factor"),
Arm = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L), levels = c("A", "B"), class = "factor"), Visit = structure(c(1L,
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L,
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L,
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L), levels = c("V1",
"V2", "V3"), class = c("ordered", "factor")), Visit_non_ord = structure(c(1L,
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L,
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L,
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L), levels = c("V1",
"V2", "V3"), class = "factor"), Painscore = c(1L, 2L, 4L,
5L, 6L, NA, 4L, 2L, 3L, 4L, 3L, 4L, 5L, NA, 5L, 6L, 7L, 6L,
4L, 3L, 5L, 1L, NA, 4L, 5L, 4L, 5L, 4L, 6L, NA, 0L, 4L, 3L,
4L, NA, 4L, 3L, 4L, NA, 6L, 7L, 8L)), row.names = c(NA, -42L
), class = "data.frame")
Imputation:
library(mice)
library(glmtoolbox)
imp <- mice(d, m=5)
imp$predictorMatrix[1:4, 1:5] <- 0
imp$predictorMatrix[5, 1] <- 0
imp <- mice(d, m=5, predictorMatrix = imp$predictorMatrix)
imp_list <- complete(imp, "all")
Analysis.
Fitting GEE
(trend_fit <- lapply(imp_list,
function(dat) {
glmgee(Painscore ~ Visit * Arm,
family = gaussian(link = "identity"),
id = ID,
data=dat,
corstr = "unstructured")}))
$`1`
Variance function: gaussian
Link: identity
Correlation structure: Unstructured
$`2`
Variance function: gaussian
Link: identity
Correlation structure: Unstructured
$`3`
Variance function: gaussian
Link: identity
Correlation structure: Unstructured
$`4`
Variance function: gaussian
Link: identity
Correlation structure: Unstructured
$`5`
Variance function: gaussian
Link: identity
Correlation structure: Unstructured
Warning messages:
1: Estimate of correlation matrix is not positive definite
2: Estimate of correlation matrix is not positive definite
3: Estimate of correlation matrix is not positive definite
4: Estimate of correlation matrix is not positive definite
5: Estimate of correlation matrix is not positive definite
Pooling:
(pooled_coef <- pool_estimates_for_qdrg(k=5,
coefficients = lapply(trend_fit,
function(analysis) analysis$coefficients),
varcov = lapply(trend_fit,
function(analysis) vcov(analysis, type = "bias-corrected"))))
$pooled_coef
(Intercept) Visit.L Visit.Q ArmB Visit.L:ArmB Visit.Q:ArmB
4.15238095 0.24243661 0.39658405 0.07619048 0.82832509 -0.85148929
$pooled_VC
(Intercept) Visit.L Visit.Q ArmB Visit.L:ArmB Visit.Q:ArmB
(Intercept) 0.31654321 -0.0763653945 -0.14233455 -0.292597632 0.121677951 0.1621637503
Visit.L -0.07636539 0.1855328798 0.03003269 0.102917976 -0.131043084 -0.0006938677
Visit.Q -0.14233455 0.0300326875 0.17368103 0.129670519 -0.046646236 -0.1714361300
ArmB -0.29259763 0.1029179757 0.12967052 0.826087176 0.002528054 -0.0012003672
Visit.L:ArmB 0.12167795 -0.1310430839 -0.04664624 0.002528054 0.626938776 0.0707090432
Visit.Q:ArmB 0.16216375 -0.0006938677 -0.17143613 -0.001200367 0.070709043 0.4825547997
Making the grid:
(emm_grid <- with(pooled_coef,
qdrg(formula = Painscore ~ Visit * Arm,
data=imp_list[[1]],
coef = pooled_coef,
vcov = pooled_VC,
df = Inf)))
'emmGrid' object with variables:
Visit = V1, V2, V3
Arm = A, B
emmeans()
(emm_trends <- emmeans(emm_grid,
specs = ~ Visit | Arm,
adjust="none",
df=Inf,
coef = pooled_coef$pooled_coef,
vcov = pooled_coef$pooled_VC))
Arm = A:
Visit emmean SE df asymp.LCL asymp.UCL
V1 4.15 0.563 Inf 3.05 5.26
V2 4.15 0.563 Inf 3.05 5.26
V3 4.15 0.563 Inf 3.05 5.26
Arm = B:
Visit emmean SE df asymp.LCL asymp.UCL
V1 4.23 0.747 Inf 2.77 5.69
V2 4.23 0.747 Inf 2.77 5.69
V3 4.23 0.747 Inf 2.77 5.69
Degrees-of-freedom method: user-specified
Confidence level used: 0.95
Contrast:
(contrast(emm_trends, "poly", max.degree=2, by = "Arm"))
Arm = A:
contrast estimate SE df z.ratio p.value
linear 0 0 Inf NaN NaN
quadratic 0 0 Inf NaN NaN
Arm = B:
contrast estimate SE df z.ratio p.value
linear 0 0 Inf NaN NaN
quadratic 0 0 Inf NaN NaN
Degrees-of-freedom method: user-specified
It failed.
Without qdrg() it works. But since the package for GEE ordinal logistic regression isn't supported by emmeans, I need to go through qdrg().
qdrg() is also necessary if I want to specify special features for GEE, e.g. using specific covariance estimator. For example, it doesn't support fully the glmtoolbox::glmgee() - I cannot specify the var-cov estimator I need.
Fitting GEE
(trend_fit <- with(imp,
glmgee(Painscore ~ Visit * Arm,
family = gaussian(link = "identity"),
id = ID,
corstr = "unstructured"))) # OK, but I cannot specify the bias-corrected covariance this way :/
call :
with.mids(data = imp, expr = glmgee(Painscore ~ Visit * Arm,
family = gaussian(link = "identity"), id = ID, corstr = "unstructured"))
call1 :
mice(data = d, m = 5, predictorMatrix = imp$predictorMatrix)
nmis :
ID Arm Visit Visit_non_ord Painscore
0 0 0 0 6
analyses :
[[1]]
Variance function: gaussian
Link: identity
Correlation structure: Unstructured
[[2]]
Variance function: gaussian
Link: identity
Correlation structure: Unstructured
[[3]]
Variance function: gaussian
Link: identity
Correlation structure: Unstructured
[[4]]
Variance function: gaussian
Link: identity
Correlation structure: Unstructured
[[5]]
Variance function: gaussian
Link: identity
Correlation structure: Unstructured
Warning messages:
1: Estimate of correlation matrix is not positive definite
2: Estimate of correlation matrix is not positive definite
3: Estimate of correlation matrix is not positive definite
4: Estimate of correlation matrix is not positive definite
5: Estimate of correlation matrix is not positive definite
emmeans()
(emm_trends <- emmeans(trend_fit,
specs = ~ Visit | Arm,
adjust="none",
df=Inf))
Arm = A:
Visit emmean SE df asymp.LCL asymp.UCL
V1 4.14 0.551 Inf 3.06 5.22
V2 3.83 0.702 Inf 2.45 5.20
V3 4.49 0.426 Inf 3.65 5.32
Arm = B:
Visit emmean SE df asymp.LCL asymp.UCL
V1 3.29 0.748 Inf 1.82 4.75
V2 4.60 0.647 Inf 3.33 5.87
V3 4.80 1.034 Inf 2.77 6.83
Covariance estimate used: robust
Degrees-of-freedom method: user-specified
Confidence level used: 0.95
Contrast:
(contrast(emm_trends, "poly", max.degree=2, by = "Arm"))
Arm = A:
contrast estimate SE df z.ratio p.value
linear 0.343 0.533 Inf 0.643 0.5202
quadratic 0.971 0.879 Inf 1.105 0.2693
Arm = B:
contrast estimate SE df z.ratio p.value
linear 1.514 0.980 Inf 1.545 0.1222
quadratic -1.114 1.207 Inf -0.923 0.3560
Degrees-of-freedom method: user-specified
OK, it works this way.
( Though the covariance estimator is not the one I want, so this example doesn't solve my problem; qdrg() is necessary. )
This is a whole lot to wade through, and at the end ("OK, it works this way") it sounds like it doesn't reproduce the error. If there's more to it than that, maybe I can look at it again after I get back from a trip in late July.
@jpiaskowski Thanks for pointing us to that repro example advice!
No, it does not work: I wrote "It failed." where it failed. Kindly please look at the whole post when you return. I just showed two scenarios help you figure out what is going on. First scenario, using qdrg() which is flexible and allows me to do what I need to do - fails. Second scenarios, which does not use qdrg() and not flexible - works.
From the help for qdrg:
If ordinal is provided, the intercept terms are modified appropriate to predicting an ordinal response, as described in vignette("models"), Section O, using ordinal$mode as the mode argument (if not provided, "latent" is assumed). (All modes are supported except 'scale')
Moreover,
For this to work, we expect the first ordinal$dim - 1 elements of coef to be the estimated threshold parameters, followed by the coefficients for the linear predictor.
This means that somehow you must rearrange the coefficients and the formula so that the ordinal thresholds come first, followed by the coefficients for the linear predictor implied by formula
. The name of the ordinal response should not be included in the formula.
[I wrote this on July 5, but realized I didn't click the "comment" button]
Kindly please look at the whole post when you return.
Just the one comment with the example is 277 lines long, still over 250 lines without the data. I'd say you are asking for a lot. It's not like anybody is paying me to do this. It's not like you paid anything for this software. I am 5 years younger than the guy that a lot of people are asking to step down for cognitive decline -- and unlike him, I am not looking for a big job.
The qdrg()
function (for "quick and dirty reference grid") is offered as a possible solution for use with models that are not supported by emmeans. It works well in cases where you can give the coefficients and their covariance matrix, as well as a formula that can be used to reproduce the model matrix. There is a provision also for ordinal models and a clear statement is offered in the help as to how it expects things to be arranged for such models.
But qdrg()
is not guaranteed to work for all models, and it could well be that this is one of them. Obviously, you have tried a lot of manipulations to try to make it work. If qdrg()
does not work here, that is unfortunate, but this does not mean that there is anything wrong with the function. Really, this is your model, not mine, and I'm just not willing to try to navigate this long narrative of successes and failures. The fact that you seem to be wanting to estimate trends adds additional complications.
For a package that produces complex model objects, it is the developer of that package who would know what is needed to make predictions from that model. If that person is interested in providing emmeans support for those models, then that is what would help you. And documentation if provided that package developers can use to help set that up.
One thought. Consider making a list of reference grids for the imputations, rather than trying to make one. Presumably each of those RGs will have the same linfct
slot, but different bhat
s and V
s, that you can combine together using Rubin's rules.
Dear @rvlenth
I'm trying to git GEE-fit ordinal regression from the multgee package and follow it with emmeans under multiple imputation. Neither multgee is supported, nor itself it "talks" with mice... So a number of workarounds are necessary, but still I cannot make it work.
I use the data provided here. It's R RDS file, as the imputed dataset is too big to paste it in a textual form and it takes too much time to program it from scratch.
imp_data_mids.zip or here - with an easier structure converted to just list, will be required for qdrg(): imp_data_list.zip
Description: It's a longitudinal study with 2 arms (HD, SoC), 6 visits (week3...month 20), some ordinal data ODIPain and a few covariates. Analyses: 1) between-arm comparisons at each timepoint - this works 2) within-arm trend over time - this fails
Between-arm comparisons at each timepoint
/ This one works, but I wanted to make some context /
I started with the simplest approach using function with(), but the ordLORgee doesn't accept this way: (I asked the author to provide integration with MICE and emmeans: https://github.com/AnestisTouloumis/multgee/issues/7 and https://github.com/AnestisTouloumis/multgee/issues/8 )
So I went this way:
OK. So this part of analysis works. Now I wanted to perform the within-arm analysis of trends:
Within-arm trends
I refit the model now with Visit being an ordered factor:
So far so good. Please note that the "L", "Q" and "C" components are present in the coefficients.
So it failed.
But when I did this analysis using GEE for numerical data (treating ordinal response as numeric) there was no problem and the ordered Visit factor was properly recognized:
Let's try with GEE for numerical data via qdrg()
It fails.
PS: the pooling function - adopted from your code