[Not a bug report - a question rather] cLDA with MMRM vs. nlme::gls + emmeans + Satterthwaite

Dear Authors of the MMRM. First of all - please accept my big Thank You for doing awesome work!

I did just a few "early" comparisons between MMRM vs. nlme::gls in the context of cLDA and wanted to ask you one question. TL;DR - the question is at the end of my post :) It will be rather lengthy, as I wanted to show everything step by step.

First of all, used versions. My current numerically validated environment is based on R core 3.6.3 and I used this version for the gls part. The mmrm requires 4+ and I run it on a "wild" 4.0.3

The analysis: cLDA with common baseline. Compared engines: nlme::gls() (3.1.152) + emmeans (1.7.5) vs mmrm (0.1.5) + emmeans (1.7.5)

The data: 30 observations, at 3 timepoints: {baseline, W1, W2}. No missing observations.

data <- structure(list(id = structure(c(1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 
4L, 5L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 6L, 7L, 8L, 
9L, 10L, 6L, 7L, 8L, 9L, 10L), .Label = c("1", "2", "3", "4", 
"5", "101", "102", "103", "104", "105"), class = "factor"), tim = structure(c(1L, 
1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 1L, 1L, 
1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L), .Label = c("Bas", 
"W1", "W2"), class = "factor"), trt = structure(c(1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("A", 
"B"), class = "factor"), val = c(-0.445778264836677, -1.2058565689643, 
0.0411263138456899, 0.639388407571143, -0.786554355912735, 0.93451070190448, 
0.844132115922294, 2.03975069147444, 1.30149437711243, -0.0531177589385456, 
2.51757217231966, 2.94551129841814, 3.62138118865766, 3.37912768406252, 
2.16395895144539, 0.170057481208407, 0.155078715940733, 0.0249318673672384, 
-2.04658541402115, 0.213154105608615, 6.05007166431368, 2.15298399368789, 
4.21424733194768, 3.91257174705049, 2.73317503681323, 6.10316126083229, 
3.71615586428783, 5.83494216711747, 6.06097572188509, 6.72093565459939
)), row.names = c(NA, 30L), class = "data.frame")

> head(data)
  id tim trt     val
1  1 Bas   A -0.4458
2  2 Bas   A -1.2059
3  3 Bas   A  0.0411
4  4 Bas   A  0.6394
5  5 Bas   A -0.7866
6  1  W1   A  0.9345

> tail(data)
    id tim trt  val
25 105  W1   B 2.73
26 101  W2   B 6.10
27 102  W2   B 3.72
28 103  W2   B 5.83
29 104  W2   B 6.06
30 105  W2   B 6.72

Packages:

library(dplyr)
library(nlme)
library(emmeans)

Fitting the cLDA with unstructured covariance

_Note: it's incomplete - gives different baseline estimates as the (Intercept) term is present in the model matrix. We will deal with it later. Yes, we could remove it manually here, but then we'd have to deal with something like val ~ NewModelMatrix which gives weird coef. names. I don't want that.)

>  m <- gls(val ~ tim + trt:tim, 
         data=data, 
         correlation=corSymm (form = ~ 1 | id),
         weights = varIdent(form=~1|tim))

> coef(summary(m))
              Value Std.Error t-value  p-value
(Intercept) -0.3515     0.384  -0.914 3.70e-01
timW1        1.3649     0.619   2.206 3.72e-02
timW2        3.2770     0.549   5.974 3.63e-06
timBas:trtB  0.0549     0.544   0.101 9.20e-01
timW1:trtB   2.7993     0.755   3.707 1.10e-03
timW2:trtB   2.7617     0.580   4.763 7.59e-05

# For comparison: MMRM
# > coef(summary( m <- mmrm(formula = val ~ tim+trt:tim + us(tim | id), data = data)))
#                Estimate Std. Error       df    t value     Pr(>|t|)
# (Intercept) -0.35153489  0.3844341 7.999975 -0.9144216 0.3872256401
# timW1        1.36488892  0.6186984 7.999967  2.2060650 0.0584386929
# timW2        3.27704515  0.5485759 7.999970  5.9737320 0.0003330192
# timBas:trtB  0.05486224  0.5436720 7.999975  0.1009106 0.9221046728
# timW1:trtB   2.79925593  0.7551763 7.999974  3.7067580 0.0059842020
# timW2:trtB   2.76172387  0.5798702 7.999963  4.7626589 0.0014219589

Now, for simplicity (only), let's check the emmeans with residual DF. gls: 30 observations - 6 parameters (2 baselines due to the Intercept included) = 24 DoF emmeans takes also the DoF for the estimated covariances - 1. For the US it's: t(t+1)/2 covariances = 6 parameters. So finally we obtain 24-5=19 DoF. ( far from 5 observations per cluster)

> (emm  <- emmeans(m, ~trt*tim, mode="df.error", nesting=NULL))

trt tim emmean    SE df lower.CL upper.CL
 A   Bas  -0.35 0.384 19    -1.16     0.45
 B   Bas  -0.30 0.384 19    -1.10     0.51
 A   W1    1.01 0.534 19    -0.10     2.13
 B   W1    3.81 0.534 19     2.69     4.93
 A   W2    2.93 0.410 19     2.07     3.78
 B   W2    5.69 0.410 19     4.83     6.55

Degrees-of-freedom method: df.error 
Confidence level used: 0.95

Now with Satterthwaite DoF it gives about 8 DoF, which is close to the cluster size and here emmeans agrees with mmrm.

Note: for MMRM the emmeans("mode") doesn't matter - it's Satterthwaite by design PS: Notice the approximate Satt. This will trigger my question at the end.

# Set seed for the approximate Satterthwaite "perturbation" method in emmeans if it happens:
> set.seed(1000)
> (emm  <- emmeans(m, ~trt*tim, mode="satterthwaite", nesting=NULL))

Analytical Satterthwaite method not available; using appx-satterthwaite
 trt tim emmean    SE   df lower.CL upper.CL
 A   Bas  -0.35 0.384 8.17    -1.23     0.53
 B   Bas  -0.30 0.384 8.17    -1.18     0.59
 A   W1    1.01 0.534 7.86    -0.22     2.25
 B   W1    3.81 0.534 7.86     2.58     5.05
 A   W2    2.93 0.410 7.90     1.98     3.87
 B   W2    5.69 0.410 7.90     4.74     6.63

Degrees-of-freedom method: appx-satterthwaite 
Confidence level used: 0.95

# For comparison: MMRM
#  trt tim emmean    SE df lower.CL upper.CL
#  A   Bas -0.352 0.384  8   -1.238    0.535
#  B   Bas -0.297 0.384  8   -1.183    0.590
#  A   W1   1.013 0.534  8   -0.218    2.245
#  B   W1   3.813 0.534  8    2.581    5.044
#  A   W2   2.926 0.410  8    1.980    3.871
#  B   W2   5.687 0.410  8    4.742    6.633
# 
# Confidence level used: 0.95

Now we estimate the common baseline:

# Set seed for the approximate Satterthwaite "perturbation" method in emmeans:
> set.seed(1000)
> (emm1 <- update(emmeans(emm[1:2], ~ 1) + emm[3:6], adjust="none", infer = c(TRUE, TRUE)))

 1       trt tim emmean    SE   df lower.CL upper.CL t.ratio p.value
 overall .   .    -0.32 0.272 8.17    -0.95     0.30  -1.190  0.2666
 .       A   W1    1.01 0.534 7.86    -0.22     2.25   1.900  0.0950
 .       B   W1    3.81 0.534 7.86     2.58     5.05   7.140  0.0001
 .       A   W2    2.93 0.410 7.90     1.98     3.87   7.130  0.0001
 .       B   W2    5.69 0.410 7.90     4.74     6.63  13.870  <.0001

Results are averaged over some or all of the levels of: trt 
Degrees-of-freedom method: appx-satterthwaite 
Confidence level used: 0.95 

# For comparison MMRM
#  1       trt tim emmean    SE df lower.CL upper.CL t.ratio p.value
#  overall .   .   -0.324 0.272  8   -0.951    0.303  -1.192  0.2673
#  .       A   W1   1.013 0.534  8   -0.218    2.245   1.898  0.0943
#  .       B   W1   3.813 0.534  8    2.581    5.044   7.140  0.0001
#  .       A   W2   2.926 0.410  8    1.980    3.871   7.135  0.0001
#  .       B   W2   5.687 0.410  8    4.742    6.633  13.870  <.0001
# 
# Results are averaged over some or all of the levels of: trt 
# Confidence level used: 0.95

Let's test the treatment at timepoint contrasts:

# Set seed for the approximate Satterthwaite "perturbation" method in emmeans:
> set.seed(1000)
> emmeans::contrast(emm1,
                  list(Trt_at_visit_1 = c(0, -1, 1, 0, 0),
                       Trt_at_visit_2 = c(0, 0, 0,  -1, 1)),                  
                  adjust = "none")

 contrast       estimate    SE   df t.ratio p.value
 Trt_at_visit_1     2.80 0.755 7.86   3.710  0.0062
 Trt_at_visit_2     2.76 0.580 7.90   4.760  0.0015

Results are averaged over some or all of the levels of: trt 
Degrees-of-freedom method: appx-satterthwaite 

# For comparison MMRM
#  contrast       estimate    SE df t.ratio p.value
#  Trt_at_visit_1     2.80 0.755  8   3.707  0.0060
#  Trt_at_visit_2     2.76 0.580  8   4.763  0.0014
# 
# Results are averaged over some or all of the levels of: trt

So far - so good! Conclusions: for the US, we get very similar (practically equal) outcomes. We will see differences later, for the CS.

Now, let's fit the cLDA with homogeneous compound symmetry (CS)

> m <- gls(val ~ tim + trt:tim, 
         data=data, 
         correlation=corCompSymm (form = ~ 1 | id))

> coef(summary(m))
              Value Std.Error t-value  p-value
(Intercept) -0.3515     0.448 -0.7854 4.40e-01
timW1        1.3649     0.546  2.5009 1.96e-02
timW2        3.2770     0.546  6.0046 3.37e-06
timBas:trtB  0.0549     0.633  0.0867 9.32e-01
timW1:trtB   2.7993     0.633  4.4221 1.81e-04
timW2:trtB   2.7617     0.633  4.3628 2.10e-04

# For comparison: MMRM
# > coef(summary( m <- mmrm(formula = val ~ tim+trt:tim + cs(tim | id), data = data)))
#                Estimate Std. Error       df     t value     Pr(>|t|)
# (Intercept) -0.35153489  0.4476079 21.20574 -0.78536353 4.409275e-01
# timW1        1.36488892  0.5457588 16.00000  2.50090137 2.363133e-02
# timW2        3.27704515  0.5457588 16.00000  6.00456683 1.836298e-05
# timBas:trtB  0.05486224  0.6330131 21.20574  0.08666842 9.317482e-01
# timW1:trtB   2.79925593  0.6330131 21.20574  4.42211367 2.320715e-04
# timW2:trtB   2.76172387  0.6330131 21.20574  4.36282255 2.675784e-04

Now let's check the emmeans with residual DF. gls: 30 observations - 6 parameters (2 baselines due to the Intercept included) = 24 DoF emmeans, for the CS takes: 2 covariances -1. So finally we obtain 24-1=23

> (emm  <- emmeans(m, ~trt*tim, mode="df.error", nesting=NULL))

trt tim emmean    SE df lower.CL upper.CL
 A   Bas  -0.35 0.448 23    -1.28     0.57
 B   Bas  -0.30 0.448 23    -1.22     0.63
 A   W1    1.01 0.448 23     0.09     1.94
 B   W1    3.81 0.448 23     2.89     4.74
 A   W2    2.93 0.448 23     2.00     3.85
 B   W2    5.69 0.448 23     4.76     6.61

Degrees-of-freedom method: df.error 
Confidence level used: 0.95

Now with Satterthwaite DoF, the DoFs are adjusted a little (for homogen. CS all DoF are pooled)

But the emmeans::satterthwaite makes it visibly more towards the cluster size (9.76 to 15.20 ) than MMRM (21.2). How would you explain this situation? Which one is closer to SAS?

# Set seed for the approximate Satterthwaite "perturbation" method in emmeans:
> set.seed(1000)
> (emm  <- emmeans(m, ~trt*tim, mode="satterthwaite", nesting=NULL))

 trt tim emmean    SE    df lower.CL upper.CL
 A   Bas  -0.35 0.448  9.76    -1.35     0.65
 B   Bas  -0.30 0.448  9.76    -1.30     0.70
 A   W1    1.01 0.448 15.20     0.06     1.97
 B   W1    3.81 0.448 15.20     2.86     4.77
 A   W2    2.93 0.448  9.76     1.92     3.93
 B   W2    5.69 0.448  9.76     4.69     6.69

Degrees-of-freedom method: satterthwaite 
Confidence level used: 0.95 

# For comparison: MMRM
#  trt tim emmean    SE   df lower.CL upper.CL
#  A   Bas -0.352 0.448 21.2  -1.2818    0.579
#  B   Bas -0.297 0.448 21.2  -1.2270    0.634
#  A   W1   1.013 0.448 21.2   0.0831    1.944
#  B   W1   3.813 0.448 21.2   2.8823    4.743
#  A   W2   2.926 0.448 21.2   1.9952    3.856
#  B   W2   5.687 0.448 21.2   4.7569    6.618
# 
# Confidence level used: 0.95

Now we will estimate the common baseline:

# Set seed for the approximate Satterthwaite "perturbation" method in emmeans:
> set.seed(1000)
> (emm1 <- update(emmeans(emm[1:2], ~ 1) + emm[3:6], adjust="none", infer = c(TRUE, TRUE)))

 1       trt tim emmean    SE    df lower.CL upper.CL t.ratio p.value
 overall .   .    -0.32 0.317  9.76    -1.03     0.38  -1.020  0.3310
 .       A   W1    1.01 0.448 15.20     0.06     1.97   2.260  0.0390
 .       B   W1    3.81 0.448 15.20     2.86     4.77   8.520  <.0001
 .       A   W2    2.93 0.448  9.76     1.92     3.93   6.540  <.0001
 .       B   W2    5.69 0.448  9.76     4.69     6.69  12.710  <.0001

Results are averaged over some or all of the levels of: trt 
Degrees-of-freedom method: satterthwaite 
Confidence level used: 0.95 

# For comparison: MMRM
#  1       trt tim emmean    SE   df lower.CL upper.CL t.ratio p.value
#  overall .   .   -0.324 0.317 21.2  -0.9819    0.334  -1.024  0.3174
#  .       A   W1   1.013 0.448 21.2   0.0831    1.944   2.264  0.0342
#  .       B   W1   3.813 0.448 21.2   2.8823    4.743   8.518  <.0001
#  .       A   W2   2.926 0.448 21.2   1.9952    3.856   6.536  <.0001
#  .       B   W2   5.687 0.448 21.2   4.7569    6.618  12.706  <.0001
# 
# Results are averaged over some or all of the levels of: trt 
# Confidence level used: 0.95

And finally let's test the contrasts of interest:

# Set seed for the approximate Satterthwaite "perturbation" method in emmeans:
> set.seed(1000)
> emmeans::contrast(emm1,
                  list(Trt_at_visit_1 = c(0, -1, 1, 0, 0),
                       Trt_at_visit_2 = c(0, 0, 0,  -1, 1)),                  
                  adjust = "none")

 contrast       estimate    SE    df t.ratio p.value
 Trt_at_visit_1     2.80 0.633 15.20   4.420  0.0005
 Trt_at_visit_2     2.76 0.633  9.76   4.360  0.0015

Results are averaged over some or all of the levels of: trt 
Degrees-of-freedom method: satterthwaite 

# For comparison MMRM
#  contrast       estimate    SE   df t.ratio p.value
#  Trt_at_visit_1     2.80 0.633 21.2   4.422  0.0002
#  Trt_at_visit_2     2.76 0.633 21.2   4.363  0.0003
# 
# Results are averaged over some or all of the levels of: trt

OK, it's close, but the DoF vary quite noticeably.

Question: 💡 When the emmeans experiences a problem with the var-cov matrix, it says: "Analytical Satterthwaite method not available; using appx-satterthwaite". Quote: "It estimates a needed gradient in the covariance matrix experimentally by randomly perturbing the response values. Thus, the degrees of freedom will vary slightly (or possibly even a lot) if the reference grid is re-calculated."

This means to me, that the closed-form formula cannot be applied. In this case, the author uses a perturbation method, which works pretty well (and definitely better than df.error). I attach 2 screenshots from the sources ( https://rdrr.io/cran/emmeans/src/R/helpers.R ):

Fig. 1) obraz

Fig 2) obraz

But the MMRM doesn't show such issue. The SEs are so close to each other (gls vs. mmrm), so I guess the var-cov was estimated pretty the same. Which means that your algorithm did not experience the issue that emmeans experienced. Or it does, but you handle it somehow.

For the CS even the exact Satterthwaite the DoF differ even more: obraz

There are 4 possibilities: 1) there are different approaches to Satterthwaite and both are valid. Thus, emmeans and mmrm are just both correct. (it's like with the plethora of robust covariance stimators HC0...HC5 or CR0...CR3 - all different, all valid). 2) emmeans has issue somewhere 3) mmrm has issue somewhere 4) both have issues :)

PS: This may be important also if you think about adding the robust SEs (so much needed if we switch from US to simpler structures), for example using the clubSandwich package (CRx estimators).

Thanks @hugesingleton , I think the best would be to compare this with SAS results. We have seen before that the approximate Satterthwaite from emmeans can be quite off. Alternatively you could share the (toy) data and code here as reprex so that we can debug on our end. Thanks :-)

Dear @danielinteractive I provided the data and the code in my post, documenting it exactly step by step. One can just copy it and paste directly to R, along with all R commands.

The question I expressed with my post is exactly about why doesn't MMRM need the approximate algorithm, if emmeans needs it? Did you find a workaround for problematic covariance matrices? Is the algorithm in emmeans somehow "incomplete" (the sources are available, so you can directly compare the algorithms), relies on some special assumptions? Let me also tag @rvlenth regarding this issue.

thx @hugesingleton , now on my laptop I saw it, thanks :-) As far as I understand emmeans uses a different way to calculate Satterthwaite. We still need to document exactly how we do this (see #181) but basically we use the same "exact" approach as lmerTest does for lme4.

Ah, I understand, this makes sense now. Thank you! OK, let's check then. Apart from those minor issues, let me thank again, wholeheartedly, the whole Team working on MMRM and the entire OpenPharma ecosystem!

Thanks again @hugesingleton for the question! So I quickly compared now mmrm vs. SAS for this example, and I am happy to summarize thatmmrm exactly matches SAS results (well, up to numerical accuracy as usual of course).

To compare you can use (if you have a SAS server) the nice new (still a bit rough but useable) https://github.com/insightsengineering/sasr, here the full code:

# remotes::install_github(repo = 'insightsengineering/sasr')
library(sasr)
# install_saspy()
# then need to set up sascfg_personal.py
# So now we can look at the example.
data <- structure(list(id = structure(c(
  1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L,
  4L, 5L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 6L, 7L, 8L,
  9L, 10L, 6L, 7L, 8L, 9L, 10L
), .Label = c(
  "1", "2", "3", "4",
  "5", "101", "102", "103", "104", "105"
), class = "factor"), tim = structure(c(
  1L,
  1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 1L, 1L,
  1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L
), .Label = c(
  "Bas",
  "W1", "W2"
), class = "factor"), trt = structure(c(
  1L, 1L, 1L,
  1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L,
  2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L
), .Label = c(
  "A",
  "B"
), class = "factor"), val = c(
  -0.445778264836677, -1.2058565689643,
  0.0411263138456899, 0.639388407571143, -0.786554355912735, 0.93451070190448,
  0.844132115922294, 2.03975069147444, 1.30149437711243, -0.0531177589385456,
  2.51757217231966, 2.94551129841814, 3.62138118865766, 3.37912768406252,
  2.16395895144539, 0.170057481208407, 0.155078715940733, 0.0249318673672384,
  -2.04658541402115, 0.213154105608615, 6.05007166431368, 2.15298399368789,
  4.21424733194768, 3.91257174705049, 2.73317503681323, 6.10316126083229,
  3.71615586428783, 5.83494216711747, 6.06097572188509, 6.72093565459939
)), row.names = c(NA, 30L), class = "data.frame")

library(mmrm)
m <- mmrm(formula = val ~ tim+trt:tim + cs(tim | id), data = data)
library(emmeans)
emm  <- emmeans(m, ~trt*tim, nesting=NULL)
emm

cont <- contrast(emm, list(trt1 = c(0, 0, -1, 1, 0, 0),
                        trt2 = c(0, 0, 0, 0, -1, 1)), adjust = "none")
cont

# Now with SAS:
df2sd(data, "dat")
result <- run_sas(
  "PROC MIXED DATA = dat cl method=reml;
      CLASS id tim trt;
      MODEL val = tim trt*tim / ddfm=satterthwaite solution chisq;
      REPEATED tim / subject=id type=CS rcorr;
      LSMEANS tim*trt / cl alpha=0.05;
      LSMESTIMATE tim*trt 'trt1diff' 0 0 -1 1 0 0,
                          'trt2diff' 0 0 0 0 -1 1;
    RUN;
      "
)
cat(result$LOG)
cat(result$LST)

# We fitted the same model:
all.equal(deviance(m), 76.37658057)
summary(m)
cov2cor(VarCorr(m))

# We can see that the least square means d.f. are all 21.2 in SAS and in R:
emm
cont

For reference, the SAS results I get here are:

                                                           The SAS System                  Friday, November 25, 2022 03:15:00 PM   7

                                                        The Mixed Procedure

                                                         Model Information

                                       Data Set                     WORK.DAT                 
                                       Dependent Variable           val                      
                                       Covariance Structure         Compound Symmetry        
                                       Subject Effect               id                       
                                       Estimation Method            REML                     
                                       Residual Variance Method     Profile                  
                                       Fixed Effects SE Method      Model-Based              
                                       Degrees of Freedom Method    Satterthwaite            

                                                      Class Level Information

                                         Class    Levels    Values

                                         id           10    1 101 102 103 104 105 2 3 4 5 
                                         tim           3    Bas W1 W2                     
                                         trt           2    A B                           

                                                            Dimensions

                                                Covariance Parameters             2
                                                Columns in X                     10
                                                Columns in Z                      0
                                                Subjects                         10
                                                Max Obs per Subject               3

                                                      Number of Observations

                                            Number of Observations Read              30
                                            Number of Observations Used              30
                                            Number of Observations Not Used           0

                                                         Iteration History

                                    Iteration    Evaluations    -2 Res Log Like       Criterion

                                            0              1        77.80797515                
                                            1              1        76.37658057      0.00000000

                                                     Convergence criteria met.                    

                                               Estimated R Correlation Matrix for id 1

                                               Row        Col1        Col2        Col3

                                                 1      1.0000      0.2567      0.2567
                                                 2      0.2567      1.0000      0.2567
                                                 3      0.2567      0.2567      1.0000

                                                   Covariance Parameter Estimates

                                 Cov Parm     Subject    Estimate     Alpha       Lower       Upper

                                 CS           id           0.2571      0.05     -0.2671      0.7814
                                 Residual                  0.7446      0.05      0.4130      1.7248

                                                          Fit Statistics

                                               -2 Res Log Likelihood            76.4
                                               AIC (Smaller is Better)          80.4
                                               AICC (Smaller is Better)         80.9
                                               BIC (Smaller is Better)          81.0

                                                  Null Model Likelihood Ratio Test

                                                    DF    Chi-Square      Pr > ChiSq

                                                     1          1.43          0.2315

                                                     Solution for Fixed Effects

                                                                  Standard
                           Effect       tim    trt    Estimate       Error      DF    t Value    Pr > |t|

                           Intercept                    5.6872      0.4476    21.2      12.71      <.0001
                           tim          Bas            -5.9839      0.5458      16     -10.96      <.0001
                           tim          W1             -1.8746      0.5458      16      -3.43      0.0034
                           tim          W2                   0           .       .        .         .    
                           tim*trt      Bas    A      -0.05486      0.6330    21.2      -0.09      0.9317
                           tim*trt      Bas    B             0           .       .        .         .    
                           tim*trt      W1     A       -2.7993      0.6330    21.2      -4.42      0.0002
                           tim*trt      W1     B             0           .       .        .         .    
                           tim*trt      W2     A       -2.7617      0.6330    21.2      -4.36      0.0003
                           tim*trt      W2     B             0           .       .        .         .    

                                                   Type 3 Tests of Fixed Effects

                                          Num     Den
                            Effect         DF      DF    Chi-Square    F Value      Pr > ChiSq    Pr > F

                            tim             2      16        145.57      72.78          <.0001    <.0001
                            tim*trt         3    11.7         33.97      11.32          <.0001    0.0009

                                                   Least Squares Means Estimates

                                                               Standard
                            Effect     Label       Estimate       Error       DF    t Value    Pr > |t|

                            tim*trt    trt1diff      2.7993      0.6330    21.21       4.42      0.0002
                            tim*trt    trt2diff      2.7617      0.6330    21.21       4.36      0.0003

                                                        Least Squares Means

                                                Standard
           Effect     tim    trt    Estimate       Error      DF    t Value    Pr > |t|     Alpha       Lower       Upper

           tim*trt    Bas    A       -0.3515      0.4476    21.2      -0.79      0.4409      0.05     -1.2818      0.5788
           tim*trt    Bas    B       -0.2967      0.4476    21.2      -0.66      0.5146      0.05     -1.2270      0.6336
           tim*trt    W1     A        1.0134      0.4476    21.2       2.26      0.0342      0.05     0.08305      1.9437
           tim*trt    W1     B        3.8126      0.4476    21.2       8.52      <.0001      0.05      2.8823      4.7429
           tim*trt    W2     A        2.9255      0.4476    21.2       6.54      <.0001      0.05      1.9952      3.8558
           tim*trt    W2     B        5.6872      0.4476    21.2      12.71      <.0001      0.05      4.7569      6.6175

Thank you very much for checking and confirming the validity! It's a great news. You all did an excellent work!

Currently I'm going to mix the two:

MMRM when the model converges under US. I will compare it against gls + emmeans every time out of curiosity.
the nlme::gls + emmeans tandem otherwise. It supports clubSandwich which will be necessary in such case. It also enables more complex weighting schemes, e.g. heterogenous CS, which I need sometimes.

I will switch to MMRM when it supports the CR2/CR3 robust SE, and leave gls() only for special occasions.

Actually, the Authors of nlme + emmeans tandem truly deserve a statue and big respect, for enabling R people to enter the clinical trials industry, but now a serious game changer enters the scene! :)

/ PS: don't tell me you have also plans about another Xmas candy: K-R? I saw something among the TODO plans! Wow! /

Yeah KR is coming soon, before robust sandwich estimator!

Note that you can already now fit heterogeneous compound symmetry covariance structure and we also support observation weights.

Just to make a note for future use regarding #53: the clubSandwich package offers both CRx estimators along with the small-sample DoF adjustment, using their own approximate implementation of Satterthwaite (+ saddlepoint, but it's not that popular). Having your own, exact implementation of Satt. maybe you'll find somehow informative how they combined both together.

https://www.jepusto.com/files/clubSandwich-Oslo-RUG-2022-02-03.pdf https://arxiv.org/pdf/1601.01981.pdf + https://www.jepusto.com/files/Pustejovsky-Tipton-201601.pdf

(BTW, this is also interesting: https://www.r-bloggers.com/2022/01/a-%F0%9F%90%B4-race-the-wild-cluster-bootstrap-vs-satterthwaite-corrected-sandwich-estimators-when-the-number-of-clusters-is-small/ )

BTW: for tests - I repeated the analyses with CR2 via gls + emmeans + clubSandwich. Unfortunately, I had to change the approach. The previous one consumed 1 DoF too much; It's not that wrong, but messed a bit. Also, the need for "playing" with parts of the emmeans to make a single baseline output didn't aligned well with CR adjustment. With the current approach it's easier, so let's stick with it.

The results will differ from SAS for sure (mostly for the "appx-satterthwaite"), but the overall pattern should be at least comparable, when you implement the CRxx estimators in MMRM and compare the results against SAS.

data$Trt <- factor(rep(c("Bas","A_W1", "A_W2","Bas","B_W1", "B_W2"), each = 5))

m_US <- gls(val ~ Trt, data=data, correlation=corSymm (form = ~ 1 | id), weights = varIdent(form=~1|tim))
m_CS <- gls(val ~ Trt, data=data, correlation=corCompSymm (form = ~ 1 | id))

vcCR_US <- as.matrix(clubSandwich::vcovCR(m_US, type = "CR2"))
vcCR_CS <- as.matrix(clubSandwich::vcovCR(m_CS, type = "CR2"))

set.seed(1000)
emm_common_bas_US     <- emmeans(m_US, ~Trt, mode="satterthwaite", nesting=NULL, adjust="none", infer=c(TRUE, TRUE))
emm_common_bas_CS     <- emmeans(m_CS, ~Trt, mode="satterthwaite", nesting=NULL, adjust="none", infer=c(TRUE, TRUE))
emm_common_bas_CR_US  <- emmeans(m_US, ~Trt, mode="satterthwaite", nesting=NULL, adjust="none", infer=c(TRUE, TRUE), vcov. = vcCR_US)
emm_common_bas_CR_CS  <- emmeans(m_CS, ~Trt, mode="satterthwaite", nesting=NULL, adjust="none", infer=c(TRUE, TRUE), vcov. = vcCR_CS)

data.frame(mode = "US", emm_common_bas_US) %>%
    bind_rows(data.frame(mode = "US CR2", emm_common_bas_CR_US)) %>%
    bind_rows(data.frame(mode = "CS", emm_common_bas_CS)) %>%
    bind_rows(data.frame(mode = "CS CR2", emm_common_bas_CR_CS)) %>% 
  separate(col = Trt, into = c("Treatment", "Time"), sep="_", fill = "right") %>% 
  mutate(Treatment = ifelse(is.na(Time), "Both", Treatment),
         Time = ifelse(is.na(Time), "Bas", Time)) %>% 
  dplyr::select(-lower.CL, -upper.CL, t.ratio) %>% 
  tidyr::pivot_longer(emmean:p.value, names_to="stat", values_to="value") %>% 
  tidyr::pivot_wider(id_cols = Treatment:stat, names_from = c(mode), values_from = value) %>% 
  mutate(across(where(is.numeric), ~round(., 3))) %>% 
  arrange(Treatment!="Both") %>% 
  data.frame

Output

   Treatment Time    stat     US US.CR2     CS CS.CR2
1       Both  Bas  emmean -0.324 -0.324 -0.324 -0.324
2       Both  Bas      SE  0.256  0.256  0.309  0.257
3       Both  Bas      df  9.195  8.966 10.134  6.998
4       Both  Bas t.ratio -1.264 -1.264 -1.047 -1.261
5       Both  Bas p.value  0.237  0.238  0.319  0.248

6          A   W1  emmean  1.018  1.018  1.020  1.020
7          A   W1      SE  0.532  0.326  0.431  0.321
8          A   W1      df  7.904  1.144 16.460 12.191
9          A   W1 t.ratio  1.914  3.120  2.367  3.174
10         A   W1 p.value  0.092  0.172  0.030  0.008

11         A   W2  emmean  2.927  2.927  2.932  2.932
12         A   W2      SE  0.410  0.263  0.431  0.250
13         A   W2      df  7.940  1.379  9.883  2.496
14         A   W2 t.ratio  7.143 11.109  6.805 11.721
15         A   W2 p.value  0.000  0.026  0.000  0.003

16         B   W1  emmean  3.808  3.808  3.806  3.806
17         B   W1      SE  0.532  0.678  0.431  0.680
18         B   W1      df  7.904 21.265 16.460 20.136
19         B   W1 t.ratio  7.161  5.620  8.832  5.600
20         B   W1 p.value  0.000  0.000  0.000  0.000

21         B   W2  emmean  5.686  5.686  5.680  5.680
22         B   W2      SE  0.410  0.516  0.431  0.527
23         B   W2      df  7.940 20.309  9.883 13.277
24         B   W2 t.ratio 13.876 11.016 13.182 10.783
25         B   W2 p.value  0.000  0.000  0.000  0.000

I also run the MMRM over the new approach, just to observe if it changes much (only CR-unadjusted). Looks fine.

data$Trt <- factor(rep(c("Bas","A_W1", "A_W2","Bas","B_W1", "B_W2"), each = 5))
> m_US <- mmrm(val ~ Trt + us(tim | id), data=data)
> m_CS <- mmrm(val ~ Trt + cs(tim | id), data=data)
> emm_common_bas_US     <- emmeans(m_US, ~Trt, mode="satterthwaite", nesting=NULL, adjust="none", infer=c(TRUE, TRUE))
> emm_common_bas_CS     <- emmeans(m_CS, ~Trt, mode="satterthwaite", nesting=NULL, adjust="none", infer=c(TRUE, TRUE))
>     
> data.frame(mode = "US", emm_common_bas_US) %>%
+     bind_rows(data.frame(mode = "CS", emm_common_bas_CS)) %>%
+     separate(col = Trt, into = c("Treatment", "Time"), sep="_", fill = "right") %>% 
+     mutate(Treatment = ifelse(is.na(Time), "Both", Treatment),
+            Time = ifelse(is.na(Time), "Bas", Time)) %>% 
+     dplyr::select(-lower.CL, -upper.CL, t.ratio) %>% 
+     tidyr::pivot_longer(emmean:p.value, names_to="stat", values_to="value") %>% 
+     tidyr::pivot_wider(id_cols = Treatment:stat, names_from = c(mode), values_from = value) %>% 
+     mutate(across(where(is.numeric), ~round(., 3))) %>% 
+     arrange(Treatment!="Both") %>% 
+     data.frame

   Treatment Time    stat     US     CS
1       Both  Bas  emmean -0.324 -0.324
2       Both  Bas      SE  0.256  0.309
3       Both  Bas      df  9.000 22.612
4       Both  Bas t.ratio -1.264 -1.047
5       Both  Bas p.value  0.238  0.306

6          A   W1  emmean  1.018  1.020
7          A   W1      SE  0.532  0.431
8          A   W1      df  8.060 24.593
9          A   W1 t.ratio  1.914  2.367
10         A   W1 p.value  0.092  0.026

11         A   W2  emmean  2.927  2.932
12         A   W2      SE  0.410  0.431
13         A   W2      df  8.009 24.593
14         A   W2 t.ratio  7.143  6.805
15         A   W2 p.value  0.000  0.000

16         B   W1  emmean  3.808  3.806
17         B   W1      SE  0.532  0.431
18         B   W1      df  8.060 24.593
19         B   W1 t.ratio  7.161  8.832
20         B   W1 p.value  0.000  0.000

21         B   W2  emmean  5.686  5.680
22         B   W2      SE  0.410  0.431
23         B   W2      df  8.009 24.593
24         B   W2 t.ratio 13.876 13.182
25         B   W2 p.value  0.000  0.000

plus the contrasts. GLS + emmeans:


set.seed(1000)
data.frame(mode = "US", emmeans::contrast(emm_common_bas_US,
                  list(Trt_at_visit_1 = c(-1, 0, 1, 0, 0),
                       Trt_at_visit_2 = c(0, -1, 0,  1, 0)), adjust = "none")) %>% 
  bind_rows(data.frame(mode = "US CR",
                      emmeans::contrast(emm_common_bas_CR_US,
                                        list(Trt_at_visit_1 = c(-1, 0, 1, 0, 0),
                                             Trt_at_visit_2 = c(0, -1, 0,  1, 0)), 
                                        adjust = "none"))) %>% 
  bind_rows(data.frame(mode = "CS",
                      emmeans::contrast(emm_common_bas_CS,
                                        list(Trt_at_visit_1 = c(-1, 0, 1, 0, 0),
                                             Trt_at_visit_2 = c(0, -1, 0,  1, 0)), 
                                        adjust = "none"))) %>% 
  bind_rows(data.frame(mode = "CS CR",
                      emmeans::contrast(emm_common_bas_CR_CS,
                                        list(Trt_at_visit_1 = c(-1, 0, 1, 0, 0),
                                             Trt_at_visit_2 = c(0, -1, 0,  1, 0)), 
                                        adjust = "none"))) %>% 
  dplyr::select(-SE, -t.ratio)

   mode       contrast estimate    df  p.value
1    US Trt_at_visit_1     2.79  7.83 0.006083
2    US Trt_at_visit_2     2.76  7.94 0.001453

3 US CR Trt_at_visit_1     2.79  8.01 0.005838
4 US CR Trt_at_visit_2     2.76  8.05 0.001395

5    CS Trt_at_visit_1     2.79 17.20 0.000226
6    CS Trt_at_visit_2     2.75  9.62 0.001114

7 CS CR Trt_at_visit_1     2.79 19.16 0.001462
8 CS CR Trt_at_visit_2     2.75  9.28 0.001072

and MMRM:

  mode       contrast estimate        df       p.value
1   US Trt_at_visit_1 2.789949  8.000624 0.00585015123
2   US Trt_at_visit_2 2.758945  8.000429 0.00142051273

3   CS Trt_at_visit_1 2.785773 24.839722 0.00009509042
4   CS Trt_at_visit_2 2.748241 24.839722 0.00011178654

openpharma / mmrm

[Not a bug report - a question rather] cLDA with MMRM vs. nlme::gls + emmeans + Satterthwaite #183