Closed ghost closed 2 years ago
Thanks @hugesingleton , I think the best would be to compare this with SAS results. We have seen before that the approximate Satterthwaite from emmeans can be quite off. Alternatively you could share the (toy) data and code here as reprex so that we can debug on our end. Thanks :-)
Dear @danielinteractive I provided the data and the code in my post, documenting it exactly step by step. One can just copy it and paste directly to R, along with all R commands.
The question I expressed with my post is exactly about why doesn't MMRM need the approximate algorithm, if emmeans needs it? Did you find a workaround for problematic covariance matrices? Is the algorithm in emmeans somehow "incomplete" (the sources are available, so you can directly compare the algorithms), relies on some special assumptions? Let me also tag @rvlenth regarding this issue.
thx @hugesingleton , now on my laptop I saw it, thanks :-)
As far as I understand emmeans
uses a different way to calculate Satterthwaite. We still need to document exactly how we do this (see #181) but basically we use the same "exact" approach as lmerTest
does for lme4
.
Ah, I understand, this makes sense now. Thank you! OK, let's check then. Apart from those minor issues, let me thank again, wholeheartedly, the whole Team working on MMRM and the entire OpenPharma ecosystem!
Thanks again @hugesingleton for the question!
So I quickly compared now mmrm
vs. SAS
for this example, and I am happy to summarize thatmmrm
exactly matches SAS
results (well, up to numerical accuracy as usual of course).
To compare you can use (if you have a SAS server) the nice new (still a bit rough but useable) https://github.com/insightsengineering/sasr, here the full code:
# remotes::install_github(repo = 'insightsengineering/sasr')
library(sasr)
# install_saspy()
# then need to set up sascfg_personal.py
# So now we can look at the example.
data <- structure(list(id = structure(c(
1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L,
4L, 5L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 6L, 7L, 8L,
9L, 10L, 6L, 7L, 8L, 9L, 10L
), .Label = c(
"1", "2", "3", "4",
"5", "101", "102", "103", "104", "105"
), class = "factor"), tim = structure(c(
1L,
1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 1L, 1L,
1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L
), .Label = c(
"Bas",
"W1", "W2"
), class = "factor"), trt = structure(c(
1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L
), .Label = c(
"A",
"B"
), class = "factor"), val = c(
-0.445778264836677, -1.2058565689643,
0.0411263138456899, 0.639388407571143, -0.786554355912735, 0.93451070190448,
0.844132115922294, 2.03975069147444, 1.30149437711243, -0.0531177589385456,
2.51757217231966, 2.94551129841814, 3.62138118865766, 3.37912768406252,
2.16395895144539, 0.170057481208407, 0.155078715940733, 0.0249318673672384,
-2.04658541402115, 0.213154105608615, 6.05007166431368, 2.15298399368789,
4.21424733194768, 3.91257174705049, 2.73317503681323, 6.10316126083229,
3.71615586428783, 5.83494216711747, 6.06097572188509, 6.72093565459939
)), row.names = c(NA, 30L), class = "data.frame")
library(mmrm)
m <- mmrm(formula = val ~ tim+trt:tim + cs(tim | id), data = data)
library(emmeans)
emm <- emmeans(m, ~trt*tim, nesting=NULL)
emm
cont <- contrast(emm, list(trt1 = c(0, 0, -1, 1, 0, 0),
trt2 = c(0, 0, 0, 0, -1, 1)), adjust = "none")
cont
# Now with SAS:
df2sd(data, "dat")
result <- run_sas(
"PROC MIXED DATA = dat cl method=reml;
CLASS id tim trt;
MODEL val = tim trt*tim / ddfm=satterthwaite solution chisq;
REPEATED tim / subject=id type=CS rcorr;
LSMEANS tim*trt / cl alpha=0.05;
LSMESTIMATE tim*trt 'trt1diff' 0 0 -1 1 0 0,
'trt2diff' 0 0 0 0 -1 1;
RUN;
"
)
cat(result$LOG)
cat(result$LST)
# We fitted the same model:
all.equal(deviance(m), 76.37658057)
summary(m)
cov2cor(VarCorr(m))
# We can see that the least square means d.f. are all 21.2 in SAS and in R:
emm
cont
For reference, the SAS results I get here are:
The SAS System Friday, November 25, 2022 03:15:00 PM 7
The Mixed Procedure
Model Information
Data Set WORK.DAT
Dependent Variable val
Covariance Structure Compound Symmetry
Subject Effect id
Estimation Method REML
Residual Variance Method Profile
Fixed Effects SE Method Model-Based
Degrees of Freedom Method Satterthwaite
Class Level Information
Class Levels Values
id 10 1 101 102 103 104 105 2 3 4 5
tim 3 Bas W1 W2
trt 2 A B
Dimensions
Covariance Parameters 2
Columns in X 10
Columns in Z 0
Subjects 10
Max Obs per Subject 3
Number of Observations
Number of Observations Read 30
Number of Observations Used 30
Number of Observations Not Used 0
Iteration History
Iteration Evaluations -2 Res Log Like Criterion
0 1 77.80797515
1 1 76.37658057 0.00000000
Convergence criteria met.
Estimated R Correlation Matrix for id 1
Row Col1 Col2 Col3
1 1.0000 0.2567 0.2567
2 0.2567 1.0000 0.2567
3 0.2567 0.2567 1.0000
Covariance Parameter Estimates
Cov Parm Subject Estimate Alpha Lower Upper
CS id 0.2571 0.05 -0.2671 0.7814
Residual 0.7446 0.05 0.4130 1.7248
Fit Statistics
-2 Res Log Likelihood 76.4
AIC (Smaller is Better) 80.4
AICC (Smaller is Better) 80.9
BIC (Smaller is Better) 81.0
Null Model Likelihood Ratio Test
DF Chi-Square Pr > ChiSq
1 1.43 0.2315
Solution for Fixed Effects
Standard
Effect tim trt Estimate Error DF t Value Pr > |t|
Intercept 5.6872 0.4476 21.2 12.71 <.0001
tim Bas -5.9839 0.5458 16 -10.96 <.0001
tim W1 -1.8746 0.5458 16 -3.43 0.0034
tim W2 0 . . . .
tim*trt Bas A -0.05486 0.6330 21.2 -0.09 0.9317
tim*trt Bas B 0 . . . .
tim*trt W1 A -2.7993 0.6330 21.2 -4.42 0.0002
tim*trt W1 B 0 . . . .
tim*trt W2 A -2.7617 0.6330 21.2 -4.36 0.0003
tim*trt W2 B 0 . . . .
Type 3 Tests of Fixed Effects
Num Den
Effect DF DF Chi-Square F Value Pr > ChiSq Pr > F
tim 2 16 145.57 72.78 <.0001 <.0001
tim*trt 3 11.7 33.97 11.32 <.0001 0.0009
Least Squares Means Estimates
Standard
Effect Label Estimate Error DF t Value Pr > |t|
tim*trt trt1diff 2.7993 0.6330 21.21 4.42 0.0002
tim*trt trt2diff 2.7617 0.6330 21.21 4.36 0.0003
Least Squares Means
Standard
Effect tim trt Estimate Error DF t Value Pr > |t| Alpha Lower Upper
tim*trt Bas A -0.3515 0.4476 21.2 -0.79 0.4409 0.05 -1.2818 0.5788
tim*trt Bas B -0.2967 0.4476 21.2 -0.66 0.5146 0.05 -1.2270 0.6336
tim*trt W1 A 1.0134 0.4476 21.2 2.26 0.0342 0.05 0.08305 1.9437
tim*trt W1 B 3.8126 0.4476 21.2 8.52 <.0001 0.05 2.8823 4.7429
tim*trt W2 A 2.9255 0.4476 21.2 6.54 <.0001 0.05 1.9952 3.8558
tim*trt W2 B 5.6872 0.4476 21.2 12.71 <.0001 0.05 4.7569 6.6175
Thank you very much for checking and confirming the validity! It's a great news. You all did an excellent work!
Currently I'm going to mix the two:
I will switch to MMRM when it supports the CR2/CR3 robust SE, and leave gls() only for special occasions.
Actually, the Authors of nlme + emmeans tandem truly deserve a statue and big respect, for enabling R people to enter the clinical trials industry, but now a serious game changer enters the scene! :)
/ PS: don't tell me you have also plans about another Xmas candy: K-R? I saw something among the TODO plans! Wow! /
Yeah KR is coming soon, before robust sandwich estimator!
Note that you can already now fit heterogeneous compound symmetry covariance structure and we also support observation weights.
Just to make a note for future use regarding #53: the clubSandwich package offers both CRx estimators along with the small-sample DoF adjustment, using their own approximate implementation of Satterthwaite (+ saddlepoint, but it's not that popular). Having your own, exact implementation of Satt. maybe you'll find somehow informative how they combined both together.
https://www.jepusto.com/files/clubSandwich-Oslo-RUG-2022-02-03.pdf https://arxiv.org/pdf/1601.01981.pdf + https://www.jepusto.com/files/Pustejovsky-Tipton-201601.pdf
(BTW, this is also interesting: https://www.r-bloggers.com/2022/01/a-%F0%9F%90%B4-race-the-wild-cluster-bootstrap-vs-satterthwaite-corrected-sandwich-estimators-when-the-number-of-clusters-is-small/ )
BTW: for tests - I repeated the analyses with CR2 via gls + emmeans + clubSandwich. Unfortunately, I had to change the approach. The previous one consumed 1 DoF too much; It's not that wrong, but messed a bit. Also, the need for "playing" with parts of the emmeans to make a single baseline output didn't aligned well with CR adjustment. With the current approach it's easier, so let's stick with it.
The results will differ from SAS for sure (mostly for the "appx-satterthwaite"), but the overall pattern should be at least comparable, when you implement the CRxx estimators in MMRM and compare the results against SAS.
data$Trt <- factor(rep(c("Bas","A_W1", "A_W2","Bas","B_W1", "B_W2"), each = 5))
m_US <- gls(val ~ Trt, data=data, correlation=corSymm (form = ~ 1 | id), weights = varIdent(form=~1|tim))
m_CS <- gls(val ~ Trt, data=data, correlation=corCompSymm (form = ~ 1 | id))
vcCR_US <- as.matrix(clubSandwich::vcovCR(m_US, type = "CR2"))
vcCR_CS <- as.matrix(clubSandwich::vcovCR(m_CS, type = "CR2"))
set.seed(1000)
emm_common_bas_US <- emmeans(m_US, ~Trt, mode="satterthwaite", nesting=NULL, adjust="none", infer=c(TRUE, TRUE))
emm_common_bas_CS <- emmeans(m_CS, ~Trt, mode="satterthwaite", nesting=NULL, adjust="none", infer=c(TRUE, TRUE))
emm_common_bas_CR_US <- emmeans(m_US, ~Trt, mode="satterthwaite", nesting=NULL, adjust="none", infer=c(TRUE, TRUE), vcov. = vcCR_US)
emm_common_bas_CR_CS <- emmeans(m_CS, ~Trt, mode="satterthwaite", nesting=NULL, adjust="none", infer=c(TRUE, TRUE), vcov. = vcCR_CS)
data.frame(mode = "US", emm_common_bas_US) %>%
bind_rows(data.frame(mode = "US CR2", emm_common_bas_CR_US)) %>%
bind_rows(data.frame(mode = "CS", emm_common_bas_CS)) %>%
bind_rows(data.frame(mode = "CS CR2", emm_common_bas_CR_CS)) %>%
separate(col = Trt, into = c("Treatment", "Time"), sep="_", fill = "right") %>%
mutate(Treatment = ifelse(is.na(Time), "Both", Treatment),
Time = ifelse(is.na(Time), "Bas", Time)) %>%
dplyr::select(-lower.CL, -upper.CL, t.ratio) %>%
tidyr::pivot_longer(emmean:p.value, names_to="stat", values_to="value") %>%
tidyr::pivot_wider(id_cols = Treatment:stat, names_from = c(mode), values_from = value) %>%
mutate(across(where(is.numeric), ~round(., 3))) %>%
arrange(Treatment!="Both") %>%
data.frame
Output
Treatment Time stat US US.CR2 CS CS.CR2
1 Both Bas emmean -0.324 -0.324 -0.324 -0.324
2 Both Bas SE 0.256 0.256 0.309 0.257
3 Both Bas df 9.195 8.966 10.134 6.998
4 Both Bas t.ratio -1.264 -1.264 -1.047 -1.261
5 Both Bas p.value 0.237 0.238 0.319 0.248
6 A W1 emmean 1.018 1.018 1.020 1.020
7 A W1 SE 0.532 0.326 0.431 0.321
8 A W1 df 7.904 1.144 16.460 12.191
9 A W1 t.ratio 1.914 3.120 2.367 3.174
10 A W1 p.value 0.092 0.172 0.030 0.008
11 A W2 emmean 2.927 2.927 2.932 2.932
12 A W2 SE 0.410 0.263 0.431 0.250
13 A W2 df 7.940 1.379 9.883 2.496
14 A W2 t.ratio 7.143 11.109 6.805 11.721
15 A W2 p.value 0.000 0.026 0.000 0.003
16 B W1 emmean 3.808 3.808 3.806 3.806
17 B W1 SE 0.532 0.678 0.431 0.680
18 B W1 df 7.904 21.265 16.460 20.136
19 B W1 t.ratio 7.161 5.620 8.832 5.600
20 B W1 p.value 0.000 0.000 0.000 0.000
21 B W2 emmean 5.686 5.686 5.680 5.680
22 B W2 SE 0.410 0.516 0.431 0.527
23 B W2 df 7.940 20.309 9.883 13.277
24 B W2 t.ratio 13.876 11.016 13.182 10.783
25 B W2 p.value 0.000 0.000 0.000 0.000
I also run the MMRM over the new approach, just to observe if it changes much (only CR-unadjusted). Looks fine.
data$Trt <- factor(rep(c("Bas","A_W1", "A_W2","Bas","B_W1", "B_W2"), each = 5))
> m_US <- mmrm(val ~ Trt + us(tim | id), data=data)
> m_CS <- mmrm(val ~ Trt + cs(tim | id), data=data)
> emm_common_bas_US <- emmeans(m_US, ~Trt, mode="satterthwaite", nesting=NULL, adjust="none", infer=c(TRUE, TRUE))
> emm_common_bas_CS <- emmeans(m_CS, ~Trt, mode="satterthwaite", nesting=NULL, adjust="none", infer=c(TRUE, TRUE))
>
> data.frame(mode = "US", emm_common_bas_US) %>%
+ bind_rows(data.frame(mode = "CS", emm_common_bas_CS)) %>%
+ separate(col = Trt, into = c("Treatment", "Time"), sep="_", fill = "right") %>%
+ mutate(Treatment = ifelse(is.na(Time), "Both", Treatment),
+ Time = ifelse(is.na(Time), "Bas", Time)) %>%
+ dplyr::select(-lower.CL, -upper.CL, t.ratio) %>%
+ tidyr::pivot_longer(emmean:p.value, names_to="stat", values_to="value") %>%
+ tidyr::pivot_wider(id_cols = Treatment:stat, names_from = c(mode), values_from = value) %>%
+ mutate(across(where(is.numeric), ~round(., 3))) %>%
+ arrange(Treatment!="Both") %>%
+ data.frame
Treatment Time stat US CS
1 Both Bas emmean -0.324 -0.324
2 Both Bas SE 0.256 0.309
3 Both Bas df 9.000 22.612
4 Both Bas t.ratio -1.264 -1.047
5 Both Bas p.value 0.238 0.306
6 A W1 emmean 1.018 1.020
7 A W1 SE 0.532 0.431
8 A W1 df 8.060 24.593
9 A W1 t.ratio 1.914 2.367
10 A W1 p.value 0.092 0.026
11 A W2 emmean 2.927 2.932
12 A W2 SE 0.410 0.431
13 A W2 df 8.009 24.593
14 A W2 t.ratio 7.143 6.805
15 A W2 p.value 0.000 0.000
16 B W1 emmean 3.808 3.806
17 B W1 SE 0.532 0.431
18 B W1 df 8.060 24.593
19 B W1 t.ratio 7.161 8.832
20 B W1 p.value 0.000 0.000
21 B W2 emmean 5.686 5.680
22 B W2 SE 0.410 0.431
23 B W2 df 8.009 24.593
24 B W2 t.ratio 13.876 13.182
25 B W2 p.value 0.000 0.000
plus the contrasts. GLS + emmeans:
set.seed(1000)
data.frame(mode = "US", emmeans::contrast(emm_common_bas_US,
list(Trt_at_visit_1 = c(-1, 0, 1, 0, 0),
Trt_at_visit_2 = c(0, -1, 0, 1, 0)), adjust = "none")) %>%
bind_rows(data.frame(mode = "US CR",
emmeans::contrast(emm_common_bas_CR_US,
list(Trt_at_visit_1 = c(-1, 0, 1, 0, 0),
Trt_at_visit_2 = c(0, -1, 0, 1, 0)),
adjust = "none"))) %>%
bind_rows(data.frame(mode = "CS",
emmeans::contrast(emm_common_bas_CS,
list(Trt_at_visit_1 = c(-1, 0, 1, 0, 0),
Trt_at_visit_2 = c(0, -1, 0, 1, 0)),
adjust = "none"))) %>%
bind_rows(data.frame(mode = "CS CR",
emmeans::contrast(emm_common_bas_CR_CS,
list(Trt_at_visit_1 = c(-1, 0, 1, 0, 0),
Trt_at_visit_2 = c(0, -1, 0, 1, 0)),
adjust = "none"))) %>%
dplyr::select(-SE, -t.ratio)
mode contrast estimate df p.value
1 US Trt_at_visit_1 2.79 7.83 0.006083
2 US Trt_at_visit_2 2.76 7.94 0.001453
3 US CR Trt_at_visit_1 2.79 8.01 0.005838
4 US CR Trt_at_visit_2 2.76 8.05 0.001395
5 CS Trt_at_visit_1 2.79 17.20 0.000226
6 CS Trt_at_visit_2 2.75 9.62 0.001114
7 CS CR Trt_at_visit_1 2.79 19.16 0.001462
8 CS CR Trt_at_visit_2 2.75 9.28 0.001072
and MMRM:
mode contrast estimate df p.value
1 US Trt_at_visit_1 2.789949 8.000624 0.00585015123
2 US Trt_at_visit_2 2.758945 8.000429 0.00142051273
3 CS Trt_at_visit_1 2.785773 24.839722 0.00009509042
4 CS Trt_at_visit_2 2.748241 24.839722 0.00011178654
Dear Authors of the MMRM. First of all - please accept my big Thank You for doing awesome work!
I did just a few "early" comparisons between MMRM vs. nlme::gls in the context of cLDA and wanted to ask you one question. TL;DR - the question is at the end of my post :) It will be rather lengthy, as I wanted to show everything step by step.
First of all, used versions. My current numerically validated environment is based on R core 3.6.3 and I used this version for the gls part. The mmrm requires 4+ and I run it on a "wild" 4.0.3
The analysis: cLDA with common baseline. Compared engines: nlme::gls() (3.1.152) + emmeans (1.7.5) vs mmrm (0.1.5) + emmeans (1.7.5)
The data: 30 observations, at 3 timepoints: {baseline, W1, W2}. No missing observations.
Packages:
Fitting the cLDA with unstructured covariance
_Note: it's incomplete - gives different baseline estimates as the (Intercept) term is present in the model matrix. We will deal with it later. Yes, we could remove it manually here, but then we'd have to deal with something like val ~ NewModelMatrix which gives weird coef. names. I don't want that.)
Now, for simplicity (only), let's check the emmeans with residual DF. gls: 30 observations - 6 parameters (2 baselines due to the Intercept included) = 24 DoF emmeans takes also the DoF for the estimated covariances - 1. For the US it's: t(t+1)/2 covariances = 6 parameters. So finally we obtain 24-5=19 DoF. ( far from 5 observations per cluster)
Now with Satterthwaite DoF it gives about 8 DoF, which is close to the cluster size and here emmeans agrees with mmrm.
Note: for MMRM the emmeans("mode") doesn't matter - it's Satterthwaite by design PS: Notice the approximate Satt. This will trigger my question at the end.
Now we estimate the common baseline:
Let's test the treatment at timepoint contrasts:
So far - so good! Conclusions: for the US, we get very similar (practically equal) outcomes. We will see differences later, for the CS.
Now, let's fit the cLDA with homogeneous compound symmetry (CS)
Now let's check the emmeans with residual DF. gls: 30 observations - 6 parameters (2 baselines due to the Intercept included) = 24 DoF emmeans, for the CS takes: 2 covariances -1. So finally we obtain 24-1=23
Now with Satterthwaite DoF, the DoFs are adjusted a little (for homogen. CS all DoF are pooled)
But the emmeans::satterthwaite makes it visibly more towards the cluster size (9.76 to 15.20 ) than MMRM (21.2). How would you explain this situation? Which one is closer to SAS?
Now we will estimate the common baseline:
And finally let's test the contrasts of interest:
OK, it's close, but the DoF vary quite noticeably.
Question: 💡 When the emmeans experiences a problem with the var-cov matrix, it says: "Analytical Satterthwaite method not available; using appx-satterthwaite". Quote: "It estimates a needed gradient in the covariance matrix experimentally by randomly perturbing the response values. Thus, the degrees of freedom will vary slightly (or possibly even a lot) if the reference grid is re-calculated."
This means to me, that the closed-form formula cannot be applied. In this case, the author uses a perturbation method, which works pretty well (and definitely better than df.error). I attach 2 screenshots from the sources ( https://rdrr.io/cran/emmeans/src/R/helpers.R ):
Fig. 1)
Fig 2)
But the MMRM doesn't show such issue. The SEs are so close to each other (gls vs. mmrm), so I guess the var-cov was estimated pretty the same. Which means that your algorithm did not experience the issue that emmeans experienced. Or it does, but you handle it somehow.
For the CS even the exact Satterthwaite the DoF differ even more:
There are 4 possibilities: 1) there are different approaches to Satterthwaite and both are valid. Thus, emmeans and mmrm are just both correct. (it's like with the plethora of robust covariance stimators HC0...HC5 or CR0...CR3 - all different, all valid). 2) emmeans has issue somewhere 3) mmrm has issue somewhere 4) both have issues :)
PS: This may be important also if you think about adding the robust SEs (so much needed if we switch from US to simpler structures), for example using the clubSandwich package (CRx estimators).