rvlenth / emmeans

Estimated marginal means
https://rvlenth.github.io/emmeans/
364 stars 32 forks source link

emmeans + cLDA (Constrained Longitudinal Data Analysis) throws error. How to (if possible) do it properly? #388

Closed Generalized closed 1 year ago

Generalized commented 1 year ago

Dear @rvlenth,

The cLDA (proposed many years ago) is now gaining more and more popularity (actually, it's one of the standards today) in randomized clinical trials under the pharmaceutical regulators' acceptance. Briefly - when comparing treatment arms, it forces the baseline predictions to be equal. This is justified for RCTs (and invalid elsewhere), increasing the power of the analysis (at the cost of slightly increased type-1 error). The classic longitudinal analysis (LDA) puts the baseline values at the right side of the formula (removing it from the response vector) and includes the full interaction of treatment * time. The cLDA does it oppositely: it puts the baseline values to the response vector and forces exact baseline by removing from the model matrix the intercept and the treatment. This is the exception, where we have the interaction without one of the main effects - but that's fine here.

Of interest is exclusively the interaction term and the appropriate contrasts comparing treatment arms at selected time points.

A few examples: https://datascienceplus.com/taking-the-baseline-measurement-into-account-constrained-lda-in-r/ https://rdoodles.rbind.io/2020/03/analyzing-longitudinal-data-a-simple-pre-post-design/ https://www.middleprofessor.com/files/applied-biostatistics_bookdown/_book/pre-post.html

Some literature: (free) https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5223669/ (paywalled) https://www.jstor.org/stable/25053123 (paywalled) https://pubmed.ncbi.nlm.nih.gov/19764951/ https://arxiv.org/ftp/arxiv/papers/2203/2203.15555.pdf

To the point - it's quite simple, when doing it by hands:

The data:

set.seed(1000)
x <-
  data.frame(val = c(rnorm(5, mean=0), 
                     rnorm(5, mean=1.32),
                     rnorm(5, mean=0), 
                     rnorm(5, mean=3.38)),

             id = c(rep(1:5, 2), rep(101:105, 2)),
             trt = factor(c(rep("A", 5), rep("A", 5),
                     rep("B", 5), rep("B", 5))),
             tim = factor(c(rep("Bas", 5), rep("W1", 5),
                     rep("Bas", 5), rep("W1", 5))))

options(contrasts = c("contr.treatment", "contr.poly"))

The model matrix:

X <- model.matrix(~ tim * trt, data = x)
colnames(X)

Xalt <- X[, setdiff(colnames(X), c("(Intercept)", "trtB"))] 
colnames(Xalt)

The model (can be done with nlme::gls, but Frank Harrell's rms::Gls gives me also the CIs)

library(rms)

m <- rms::Gls(val ~ Xalt, data=x, 
               weights = varIdent(form=~1|tim),
               correlation=corSymm (form = ~ 1 | id))

predictions <- cbind(x, predict(m, x, conf.int=0.95))

Result:

> m
Generalized Least Squares Fit by REML

 rms::Gls(model = val ~ Xalt, data = x, correlation = corSymm(form = ~1 | 
     id), weights = varIdent(form = ~1 | tim))

 Obs  20       Log-restricted-likelihood-21.74    
 Clusters10    Model d.f.2                        
 g 1.596       sigma0.6347                        
               d.f.     17                        

            Coef    S.E.   t     Pr(>|t|)
 Intercept  -0.4630 0.2007 -2.31 0.0339  
 timW1       1.4711 0.4362  3.37 0.0036  
 timW1:trtB  2.0805 0.5545  3.75 0.0016  

 Correlation Structure: General
  Formula: ~1 | id 
  Parameter estimate(s):
  Correlation: 
   1    
 2 0.034
 Variance function:
  Structure: Different standard deviations per stratum
  Formula: ~1 | tim 
  Parameter estimates:
      Bas       W1 
 1.000000 1.382125

The predictions - here you can nicely see what happened to the baseline at both study arms:

> predictions
           val  id trt tim linear.predictors      lower       upper
1  -0.44577826   1   A Bas        -0.4630123 -0.8563721 -0.06965253
2  -1.20585657   2   A Bas        -0.4630123 -0.8563721 -0.06965253
3   0.04112631   3   A Bas        -0.4630123 -0.8563721 -0.06965253
4   0.63938841   4   A Bas        -0.4630123 -0.8563721 -0.06965253
5  -0.78655436   5   A Bas        -0.4630123 -0.8563721 -0.06965253
6   0.93451070   1   A  W1         1.0081041  0.2394582  1.77675002
7   0.84413212   2   A  W1         1.0081041  0.2394582  1.77675002
8   2.03975069   3   A  W1         1.0081041  0.2394582  1.77675002
9   1.30149438   4   A  W1         1.0081041  0.2394582  1.77675002
10 -0.05311776   5   A  W1         1.0081041  0.2394582  1.77675002
11 -0.98242783 101   B Bas        -0.4630123 -0.8563721 -0.06965253
12 -0.55448870 102   B Bas        -0.4630123 -0.8563721 -0.06965253
13  0.12138119 103   B Bas        -0.4630123 -0.8563721 -0.06965253
14 -0.12087232 104   B Bas        -0.4630123 -0.8563721 -0.06965253
15 -1.33604105 105   B Bas        -0.4630123 -0.8563721 -0.06965253
16  3.55005748 101   B  W1         3.0885773  2.3199314  3.85722322
17  3.53507872 102   B  W1         3.0885773  2.3199314  3.85722322
18  3.40493187 103   B  W1         3.0885773  2.3199314  3.85722322
19  1.33341459 104   B  W1         3.0885773  2.3199314  3.85722322
20  3.59315411 105   B  W1         3.0885773  2.3199314  3.85722322
library(ggplot2)

limits <- aes(ymax = upper , ymin=lower, shape=trt)

ggplot(predictions, aes( y=linear.predictors, x=tim)) + 
  geom_errorbar(limits, width= 0.1 , position=position_dodge(.1)) + 
  geom_line(aes(group=trt, y=linear.predictors), position=position_dodge(.1)) + 
  geom_point(position=position_dodge(.1), aes( fill=trt), shape=21, size=4) + 
  scale_fill_manual(values=c( "black", "white")) + 
  theme_bw() + 
  theme(panel.grid.major = element_blank(), legend.title=element_blank(), 
        text=element_text(size=14), legend.position="bottom") +
  xlab("") + 
  ylab("Estimated mean with corresponding 95% CI")

obraz

Now let's do something with emmeans (same result with rms::Gls or nlme::gls) Whatever I provide to the "specs" gives me the same error.

emmeans(m, specs = ~time*trt)

> emmeans(m, specs = ~time*trt)
Error in (function (object, at, cov.reduce = mean, cov.keep = get_emm_option("cov.keep"),  : 
  We are unable to reconstruct the data.
The variables needed are:
    Xalt
Are any of these actually constants? (specify via 'params = ')
The dataset name is:
    x
Does the data still exist? Or you can specify a dataset via 'data = '

Tell me, please, how to do it with emmeans? I need it to do all the usual work (testing contrasts, adjusting p-values and CIs) in a consistent manner, rather than employing lmTest or contrasts packages.

Is this possible?

rvlenth commented 1 year ago

First, emmeans() and its relatives need the factors to identify what is to be averaged. The model matrix is not the same thing, and it can't read your mind.

Second, if you want a common baseline, parameterize the model (again, in terms of factors) so that that criterion is met:

> x$Trt <- factor(rep(c("Bas","A","Bas","B"), each = 5))

> library(nlme)
> mod <- gls(val ~ Trt, data = x,
+            correlation = corSymm(form = ~1 | id), 
+            weights = varIdent(form = ~1 | tim))

> library(emmeans)
> emmeans(mod, ~ Trt)
Analytical Satterthwaite method not available; using appx-satterthwaite
 Trt emmean    SE   df lower.CL upper.CL
 A    1.008 0.392 8.09    0.105  1.91074
 B    3.089 0.392 8.09    2.186  3.99122
 Bas -0.463 0.201 8.89   -0.918 -0.00818

Degrees-of-freedom method: appx-satterthwaite 
Confidence level used: 0.95 

Note that these are exactly the values you obtained with predict().

I had wanted to fit the model val ~ Trt + tim, but that is singular and gls does not have that capability. But you can add it later using add_grouping():

> RG <- add_grouping(ref_grid(mod), "Time", "Trt", c("W1", "W1", "Bas"))
> confint(RG)
 Trt Time prediction    SE   df lower.CL upper.CL
 Bas Bas      -0.463 0.201 8.96   -0.917  -0.0087
 A   W1        1.008 0.392 7.98    0.103   1.9128
 B   W1        3.089 0.392 7.98    2.184   3.9933

Degrees-of-freedom method: appx-satterthwaite 
Confidence level used: 0.95 

> contrast(RG, "trt.vs.ctrl1")
 contrast       estimate    SE   df t.ratio p.value
 A W1 - Bas Bas     1.47 0.436 9.64   3.372  0.0141
 B W1 - Bas Bas     3.55 0.436 9.64   8.142  <.0001

Degrees-of-freedom method: appx-satterthwaite 
P value adjustment: dunnettx method for 2 tests 
Generalized commented 1 year ago

Thank you very much for so quick and detailed - as always - answer! I very appreciate it.

This kind of parametrization is very useful and easily extendable to more than 1 post-baseline values. For example for 2 visits:

set.seed(1000)

x <- data.frame(val = c(rnorm(5, mean=0), 
                     rnorm(5, mean=1.32),
                     rnorm(5, mean=3.5),
                     rnorm(5, mean=0), 
                     rnorm(5, mean=3.38),
                     rnorm(5, mean=5.5)),

             id = c(rep(1:5, 3), rep(101:105, 3)),
             trt = factor(c(rep("A", 5), rep("A", 5), rep("A", 5),
                     rep("B", 5), rep("B", 5), rep("B", 5))),
             tim = factor(c(rep("Bas", 5), rep("W1", 5), rep("W2", 5),
                     rep("Bas", 5), rep("W1", 5), rep("W2", 5))))

# For the emmeans approach
x$Trt <- factor(rep(c("Bas","A1", "A2","Bas","B1", "B2"), each = 5))

Approach with the model matrix

> X <- model.matrix(~ tim * trt, data = x)
> Xalt <- X[, setdiff(colnames(X), c("(Intercept)", "trtB"))] 

> m <- rms::Gls(val ~ Xalt, data=x, 
               weights = varIdent(form=~1|tim),
               correlation=corSymm (form = ~ 1 | id))

# coefficients:
            Coef    S.E.   t     Pr(>|t|)
 Intercept  -0.3241 0.2565 -1.26 0.2180  
 timW1       1.3421 0.5712  2.35 0.0270  
 timW2       3.2510 0.4765  6.82 <0.0001 
 timW1:trtB  2.7899 0.7495  3.72 0.0010  
 timW2:trtB  2.7589 0.5792  4.76 <0.0001 

> predictions <- cbind(x, predict(m, x, conf.int=0.95))
> predictions %>% dplyr::select(-id, -val) %>% unique()

   trt tim Trt linear.predictors       lower     upper
1    A Bas Bas        -0.3241038 -0.82674128 0.1785337
6    A  W1  A1         1.0180081 -0.02424837 2.0602645
11   A  W2  A2         2.9269003  2.12376009 3.7300405
16   B Bas Bas        -0.3241038 -0.82674128 0.1785337
21   B  W1  B1         3.8079559  2.76569944 4.8502123
26   B  W2  B2         5.6858441  4.88270389 6.4889843

# Quick check for, say, Trt 2 at visit 2 (B-W2 or B2):
> (-0.3241038) + 3.2510041 +2.7589438 
[1] 5.685844

obraz

Now the approach with emmeans:

> mod <- gls(val ~ Trt, data = x,
            correlation = corSymm(form = ~1 | id), 
            weights = varIdent(form = ~1 | tim))

> coef(summary(mod))
                Value Std.Error   t-value      p-value
(Intercept)  1.018008 0.5317733  1.914365 6.708572e-02
TrtA2        1.908892 0.4568098  4.178746 3.126959e-04
TrtB1        2.789948 0.7495194  3.722316 1.007209e-03
TrtB2        4.667836 0.6704967  6.961759 2.686314e-07
TrtBas      -1.342112 0.5711688 -2.349764 2.698616e-02

>  (em <- emmeans(mod, ~ Trt, mode="df.error"))
 Trt emmean    SE df lower.CL upper.CL
 A1   1.018 0.532 20  -0.0913    2.127
 A2   2.927 0.410 20   2.0721    3.782
 B1   3.808 0.532 20   2.6987    4.917
 B2   5.686 0.410 20   4.8311    6.541
 Bas -0.324 0.256 20  -0.8591    0.211

Degrees-of-freedom method: df.error 
Confidence level used: 0.95 

Great! Now let's check some contrasts, typically comparing Treatment A with Treatment B at visit 1 and visit 2:

Classic approach:

> contrast::contrast(mod, list(Trt="A1"), list(Trt="B1"))
gls model parameter contrast

   Contrast      S.E.     Lower     Upper     t df Pr(>|t|)
1 -2.789948 0.7495194 -4.333612 -1.246284 -3.72 25    0.001

> contrast::contrast(mod, list(Trt="A2"), list(Trt="B2"))
gls model parameter contrast

   Contrast     S.E.     Lower     Upper     t df Pr(>|t|)
1 -2.758944 0.579215 -3.951859 -1.566028 -4.76 25    1e-04

and the emmeans approach:

> emmeans::contrast(em,
+                   list(Trt_at_visit_1 = c(1, 0, -1, 0, 0),
+                        Trt_at_visit_2 = c(0, 1, 0, -1, 0)
+ ))
 contrast       estimate    SE df t.ratio p.value
 Trt_at_visit_1    -2.79 0.750 20  -3.722  0.0013
 Trt_at_visit_2    -2.76 0.579 20  -4.763  0.0001

Degrees-of-freedom method: df.error 

All agrees.

/ Except the DoF. The gls model itself reports 25 residual dof (30 - 5 parameters). The "contrasts" library reports 25 as well ("For gls model, the degrees of freedom are n−p"), while emmeans reports 20. I believe that's because it takes the 25 residual DoF minus the number of covariance parameters. For the compound symmetry there's just 1 and indeed, in this case emmeans reported 25-1=24, I checked. For the unstructured cov. it's : t(t+1)/2. Here we have 3 timepoints, which gives 3(4)/2=12/2=6, but one timepoint is "common", which gives 5. And 25-5 = 20. Is this correct? /

rvlenth commented 1 year ago

BTW, this can be done without reparameterizing the dataset. To fit them model, do basically what you did with the model matrix by omitting trt (which, BTW, imposes a nesting structure)

> mod1 <- gls(val ~ tim + tim:trt, data = x,
+             correlation = corSymm(form = ~1 | id), 
+             weights = varIdent(form = ~1 | tim))

> (emmeans(mod1, ~trt) -> EMM)
Analytical Satterthwaite method not available; using appx-satterthwaite
NOTE: A nesting structure was detected in the fitted model:
    trt %in% tim
 trt tim emmean    SE   df lower.CL upper.CL
 A   Bas -0.352 0.296 7.96   -1.034    0.331
 B   Bas -0.574 0.296 7.96   -1.257    0.108
 A   W1   1.013 0.392 7.93    0.107    1.919
 B   W1   3.083 0.392 7.93    2.177    3.989

Degrees-of-freedom method: appx-satterthwaite 
Confidence level used: 0.95 

We still get separate baseline means, but you can get the common baseline:

> EMM1 <- emmeans(EMM[1:2], ~ 1) + EMM[3:4]
> EMM1
 1       trt tim emmean    SE   df lower.CL upper.CL
 overall .   .   -0.463 0.209 7.96   -1.095    0.169
 .       A   W1   1.013 0.392 7.93   -0.173    2.199
 .       B   W1   3.083 0.392 7.93    1.897    4.269

Results are averaged over some or all of the levels of: trt 
Degrees-of-freedom method: appx-satterthwaite 
Confidence level used: 0.95 
Conf-level adjustment: bonferroni method for 3 estimates 

> contrast(EMM1, "trt.vs.ctrl1")
 contrast             estimate   SE  df t.ratio p.value
 . A W1 - overall . .     1.48 0.44 9.6   3.356  0.0146
 . B W1 - overall . .     3.55 0.44 9.6   8.061  <.0001

Results are averaged over some or all of the levels of: trt 
Degrees-of-freedom method: appx-satterthwaite 
P value adjustment: dunnettx method for 2 tests 

The labeling is kind of awkward, but you can clean it up:

> levels(EMM1) <- list(Trt = c("Bas", "A", "B"))
> EMM1
 Trt emmean    SE   df lower.CL upper.CL
 Bas -0.463 0.209 7.96   -0.946   0.0198
 A    1.013 0.392 7.93    0.107   1.9195
 B    3.083 0.392 7.93    2.177   3.9895

Degrees-of-freedom method: appx-satterthwaite 
Confidence level used: 0.95 

Regarding d.f., as is documented, "df.error is just the error degrees of freedom for the model, minus the number of extra random effects estimated." Note that the Satterthwaite d.f. above is considerably lower than the 14 we get with mode = "df.error", so even with the extra penalty, it appears we are being too optimistic with d.f.

rvlenth commented 1 year ago

PS I think you have your signs reversed in the contrast coefficients in your example. You want to subtract the baseline values.

Generalized commented 1 year ago

Thank you!

PS: yes, I always forget to code the contrasts properly :-) Thank you for catching this!

Ahhh! Nice trick with unifying the baselines via emmeans(...~1) and using the "+" operator with the emmeans objects!

Previously I ended up with ~time + time:trt formula at obtaining different baseline estimates and I wasn't sure what to do next. That's why I started playing with the model matrix (and realized I should also drop the (Intercept) term), but this complicated things.

Your approach simplifies everything.

I reproduced the analysis with more timepoints. I used the data with {Baseline, W1 and W2} timepoints defined in the previous post.

I set the DoF to "df.errors" only to observe what changed (by default I use Satterthwaite in R and Kenward-Roger in SAS).

The results are not identical, but are very, very close 👍

That's because now there are 24 DoF instead of 25 (for the emmeans it's 19 instead of 20), as now there are 2 baseline estimates, rather than the single, common one. And, because GLS is a 2-step procedure, it took the estimated var-cov (using the smaller DF) and altered the model coefficients a little.

I'm not going to worry about that 1 DoF less, just good to know what's changed under the current approach.

> mod1 <- gls(val ~ tim + tim:trt, data = x,
+             correlation = corSymm(form = ~1 | id), 
+             weights = varIdent(form = ~1 | tim))

> coef(summary(mod1))
                  Value Std.Error    t-value      p-value
(Intercept) -0.35153489 0.3844336 -0.9144228 3.695908e-01
timW1        1.36488892 0.6186978  2.2060673 3.719752e-02
timW2        3.27704515 0.5485752  5.9737396 3.634716e-06
timBas:trtB  0.05486224 0.5436713  0.1009107 9.204596e-01
timW1:trtB   2.79925593 0.7551750  3.7067643 1.101517e-03
timW2:trtB   2.76172387 0.5798690  4.7626686 7.592938e-05

> (EMM <- emmeans(mod1, ~trt*tim, mode="df.error", nesting=NULL))
 trt tim emmean    SE df lower.CL upper.CL
 A   Bas -0.352 0.384 19   -1.156    0.453
 B   Bas -0.297 0.384 19   -1.101    0.508
 A   W1   1.013 0.534 19   -0.104    2.131
 B   W1   3.813 0.534 19    2.695    4.930
 A   W2   2.926 0.410 19    2.067    3.784
 B   W2   5.687 0.410 19    4.829    6.545

Degrees-of-freedom method: df.error 
Confidence level used: 0.95 

Now, the trick with replacing the 2 baselines with the common one.

emmeans(EMM[1:2], ~ 1)
 1       emmean    SE df lower.CL upper.CL
 overall -0.324 0.272 19   -0.893    0.245

Results are averaged over the levels of: trt 
Degrees-of-freedom method: df.error 
Confidence level used: 0.95 

Good! Let's combine the parts:

> (EMM1 <- update(emmeans(EMM[1:2], ~ 1) + EMM[3:6], adjust="none", infer = c(TRUE, TRUE)))

 1       trt tim emmean    SE df lower.CL upper.CL t.ratio p.value
 overall .   .   -0.324 0.272 19   -0.893    0.245  -1.192  0.2478
 .       A   W1   1.013 0.534 19   -0.104    2.131   1.898  0.0730
 .       B   W1   3.813 0.534 19    2.695    4.930   7.140  <.0001
 .       A   W2   2.926 0.410 19    2.067    3.784   7.135  <.0001
 .       B   W2   5.687 0.410 19    4.829    6.545  13.870  <.0001

Results are averaged over some or all of the levels of: trt 
Degrees-of-freedom method: df.error 
Confidence level used: 0.95 

I used "update" to provide the adjust="none" parameter. The default action was the Bonferroni adjustment which I wanted to turn off for comparisons.

> (data.frame(EMM1) %>%
+     mutate(trt = ifelse(trt == ".", "Both", trt),
+            tim = ifelse(tim == ".", "Bas", tim)) %>% 
+     mutate(across(where(is.numeric), ~round(., 3))) %>% 
+     dplyr::select(-1) -> EMM2)

   trt tim emmean    SE df lower.CL upper.CL t.ratio p.value
1 Both Bas -0.324 0.272 19   -0.893    0.245  -1.192   0.248
2    A  W1  1.013 0.534 19   -0.104    2.131   1.898   0.073
3    B  W1  3.813 0.534 19    2.695    4.930   7.140   0.000
4    A  W2  2.926 0.410 19    2.067    3.784   7.135   0.000
5    B  W2  5.687 0.410 19    4.829    6.545  13.870   0.000

Differences: A - W1) Was: 1.018 Is: 1.013 A - W2) Was: 2.927 Is: 2.926 B - W1) Was: 3.808 Is: 3.813 B - W2) Was: 5.686 Is: 5.687

Let's check the contrasts comparing both medicines:

> emmeans::contrast(EMM1,
+                   list(Trt_at_visit_1 = c(0, -1, 1, 0, 0),
+                        Trt_at_visit_2 = c(0, 0, 0,  -1, 1)),
+                   adjust = "none")
 contrast       estimate    SE df t.ratio p.value
 Trt_at_visit_1     2.80 0.755 19   3.707  0.0015
 Trt_at_visit_2     2.76 0.580 19   4.763  0.0001

Results are averaged over some or all of the levels of: trt 
Degrees-of-freedom method: df.error 

All works!

And the plot:

ggplot(EMM2 %>% 
           slice(1) %>%
           bind_rows(EMM2) %>% 
           mutate(trt = rep(LETTERS[1:2], 3)),
       aes( y=emmean, x=tim, col=trt)) + 
  geom_errorbar(aes(ymax = upper.CL, ymin=lower.CL), 
                width= 0.1, 
                position=position_dodge(.1)) + 
  geom_line(aes(group=trt, y=emmean), 
            position=position_dodge(.1)) + 
  geom_point(position=position_dodge(.1), 
             fill="white",
             shape=21, 
             size=3) + 
  scale_fill_manual(values=c( "black", "white")) + 
  theme_bw() + 
  theme(panel.grid.major = element_blank(), legend.title=element_blank(), 
        text=element_text(size=14), legend.position="top") +
  labs(title = "LS-means and 95% CIs of the treament effect", 
       x = "Study timepoint", y = "Treatment effect [unit]")

obraz

This can be tweaked a bit, to reflect the equal estimates at baseline (one point, one CI (or no CI, if one prefers), different colour).

EMM2 %>% 
  slice(1) %>%
  bind_rows(EMM2) %>% 
  mutate(trt  = factor(rep(LETTERS[1:2], 3), levels = c("All", "A", "B"))) %>% 
  mutate(trt2 = factor(ifelse(tim == "Bas", "All", as.character(trt)),
                       levels = c("All", "A", "B"))) -> clda

ggplot(clda,
       aes(y=emmean, x=tim, col=trt2)) + 
  geom_line(aes(group=trt, y=emmean, col=trt), 
            position=position_dodge(0.1)) +
geom_errorbar(aes(ymax = upper.CL, ymin=lower.CL, group=trt2, col=trt2), 
              width= 0.1, 
              position=position_dodge(0.1)) +
  geom_point(aes(group=trt2, col=trt2),
             position=position_dodge(0.1), 
             fill="white",
             shape=21, 
             size=3) +
  scale_color_manual(values=c(All="grey", A="blue", B="green")) + 
  theme_bw() + 
  theme(panel.grid.major = element_blank(), legend.title=element_blank(), 
        text=element_text(size=14), legend.position="top") +
  labs(title = "LS-means and 95% CIs of the treament effect", 
       x = "Study timepoint", y = "Treatment effect [unit]")

obraz


You're truly a good spirit of the GitHub, Professor, always so much helpful! Thank you very much!

PS: Maybe (just loudly thinking) this topic could deserve a small "mention" in the package vignettes (or a little separate one) showing how to do the cLDA with emmeans :-)

rvlenth commented 1 year ago

Good point about losing a degree of freedom. It is better to fit a model that estimates a common baseline.

Russ

On Nov 20, 2022, at 4:23 PM, Generalized @.***> wrote:



Thank you!

PS: yes, I always forget to code the contrasts properly :-) Thank you for catching this up!

Ahhh! Nice trick with unifying the baselines via emmeans(...~1) and using the "+" operator with the emmeans objects!

Previously I ended up with ~time + time:trt formula at obtaining different baselines and I wasn't sure what to do next. That's why I started playing with the model matrix (and realized I should also drop the (Intercept) term), but this complicated things.

Your approach simplifies everything.

I reproduced the analysis with more timepoints. I used the data with {Baseline, W1 and W2} timepoints defined in the previous post.

I set the DoF to "df.errors" only to observe what changed (by default I use Satterthwaite in R and Kenward-Roger in SAS).

The results are not identical, but are very, very close 👍

That's because now there are 24 DoF instead of 25 (for the emmeans it's 19 instead of 20), as now there are 2 baseline estimates, rather than the single, common one. And, because GLS is a 2-step procedure, it took the estimated var-cov (using the smaller DF) and altered the model coefficients a little.

I'm not going to worry about that 1 DoF less, just good to know what's changed under the current approach.

mod1 <- gls(val ~ tim + tim:trt, data = x,

  • correlation = corSymm(form = ~1 | id),
  • weights = varIdent(form = ~1 | tim))

coef(summary(mod1)) Value Std.Error t-value p-value (Intercept) -0.35153489 0.3844336 -0.9144228 3.695908e-01 timW1 1.36488892 0.6186978 2.2060673 3.719752e-02 timW2 3.27704515 0.5485752 5.9737396 3.634716e-06 timBas:trtB 0.05486224 0.5436713 0.1009107 9.204596e-01 timW1:trtB 2.79925593 0.7551750 3.7067643 1.101517e-03 timW2:trtB 2.76172387 0.5798690 4.7626686 7.592938e-05

(EMM <- emmeans(mod1, ~trt*tim, mode="df.error", nesting=NULL)) trt tim emmean SE df lower.CL upper.CL A Bas -0.352 0.384 19 -1.156 0.453 B Bas -0.297 0.384 19 -1.101 0.508 A W1 1.013 0.534 19 -0.104 2.131 B W1 3.813 0.534 19 2.695 4.930 A W2 2.926 0.410 19 2.067 3.784 B W2 5.687 0.410 19 4.829 6.545

Degrees-of-freedom method: df.error Confidence level used: 0.95

Now, the trick with replacing the 2 baselines with the common one.

emmeans(EMM[1:2], ~ 1) 1 emmean SE df lower.CL upper.CL overall -0.324 0.272 19 -0.893 0.245

Results are averaged over the levels of: trt Degrees-of-freedom method: df.error Confidence level used: 0.95

Good! Let's combine the parts:

(EMM1 <- update(emmeans(EMM[1:2], ~ 1) + EMM[3:6], adjust="none", infer = c(TRUE, TRUE)))

1 trt tim emmean SE df lower.CL upper.CL t.ratio p.value overall . . -0.324 0.272 19 -0.893 0.245 -1.192 0.2478 . A W1 1.013 0.534 19 -0.104 2.131 1.898 0.0730 . B W1 3.813 0.534 19 2.695 4.930 7.140 <.0001 . A W2 2.926 0.410 19 2.067 3.784 7.135 <.0001 . B W2 5.687 0.410 19 4.829 6.545 13.870 <.0001

Results are averaged over some or all of the levels of: trt Degrees-of-freedom method: df.error Confidence level used: 0.95

I used "update" to provide the adjust="none" parameter. The default action was the Bonferroni adjustment which I wanted to turn off for comparisons.

data.frame(EMM1) %>%

  • mutate(trt = ifelse(trt == ".", "Both", trt),
  • tim = ifelse(tim == ".", "Bas", tim)) %>%
  • mutate(across(where(is.numeric), ~round(., 3))) %>%
  • dplyr::select(-1)

trt tim emmean SE df lower.CL upper.CL t.ratio p.value 1 Both Bas -0.324 0.272 19 -0.893 0.245 -1.192 0.248 2 A W1 1.013 0.534 19 -0.104 2.131 1.898 0.073 3 B W1 3.813 0.534 19 2.695 4.930 7.140 0.000 4 A W2 2.926 0.410 19 2.067 3.784 7.135 0.000 5 B W2 5.687 0.410 19 4.829 6.545 13.870 0.000

Differences: A - W1) Was: 1.018 Is: 1.013 A - W2) Was: 2.927 Is: 2.926 B - W1) Was: 3.808 Is: 3.813 B - W2) Was: 5.686 Is: 5.687

Let's check the contrasts comparing both medicines:

emmeans::contrast(EMM1,

  • list(Trt_at_visit_1 = c(0, -1, 1, 0, 0),
  • Trt_at_visit_2 = c(0, 0, 0, -1, 1)),
  • adjust = "none") contrast estimate SE df t.ratio p.value Trt_at_visit_1 2.80 0.755 19 3.707 0.0015 Trt_at_visit_2 2.76 0.580 19 4.763 0.0001

Results are averaged over some or all of the levels of: trt Degrees-of-freedom method: df.error

All works!

You're truly a good spirit of the GitHub, Professor, always so much helpful! Thank you very much!

— Reply to this email directly, view it on GitHubhttps://github.com/rvlenth/emmeans/issues/388#issuecomment-1321259687, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AGMJPL4ASLBMSYZX33F4O4LWJKQGBANCNFSM6AAAAAASFQ7DJI. You are receiving this because you were mentioned.Message ID: @.***>

Generalized commented 1 year ago

Now we have 2 solutions with emmeans.

1) relying solely on the treatment recoded to reflect the timepoints. It saves the DoF, but requires additional work to split the A1, A2, B1, B2... sequences into time and treatment back, when we want to put it on the graph. Disadvantage - the regular expression splitting the "Letter Number" combo may be a bit complicated.

Proposed solution:

> em
 Trt emmean    SE df lower.CL upper.CL
 A1   1.018 0.532 20  -0.0913    2.127
 A2   2.927 0.410 20   2.0721    3.782
 B1   3.808 0.532 20   2.6987    4.917
 B2   5.686 0.410 20   4.8311    6.541
 Bas -0.324 0.256 20  -0.8591    0.211

Degrees-of-freedom method: df.error 
Confidence level used: 0.95 

> em %>% 
+     data.frame %>%
+     separate(col = Trt, 
+              into = c("Treatment", "Time"), 
+              sep="(?<=[A-z])(?=[0-9])", 
+              fill = "right") %>%
+     mutate(Time = ifelse(is.na(Time), "Bas", Time)) %>% 
+     slice(c(1:n(),n())) %>% 
+     arrange(Time) %>% 
+     mutate(Treatment = rep(LETTERS[1:2], 3))
  Treatment Time     emmean        SE df    lower.CL  upper.CL
1         A    1  1.0180081 0.5317733 20 -0.09125152 2.1272677
2         B    1  3.8079559 0.5317733 20  2.69869629 4.9172155
3         A    2  2.9269003 0.4097729 20  2.07212892 3.7816717
4         B    2  5.6858441 0.4097729 20  4.83107273 6.5406155
5         A  Bas -0.3241038 0.2564524 20 -0.85905414 0.2108466
6         B  Bas -0.3241038 0.2564524 20 -0.85905414 0.2108466

2) using the time + time:treatment formula with averaging the baselines. It spends (number_of_arms - 1) DoF and also needs some work, but is quite simple.

> EMM  <- emmeans(mod1, ~trt*tim, mode="df.error", nesting=NULL)
> EMM1 <- update(emmeans(EMM[1:2], ~ 1) + EMM[3:6], adjust="none", infer = c(TRUE, TRUE))
> 
> EMM1 %>% 
+     data.frame %>%  
+     slice(c(1, 1:n())) %>% 
+     mutate(trt = rep(LETTERS[1:2], 3),
+            tim = ifelse(tim == ".", "Bas", tim)) %>% 
+     dplyr::select(-1)

  trt tim     emmean        SE df   lower.CL  upper.CL   t.ratio      p.value
1   A Bas -0.3241038 0.2718356 19 -0.8930623 0.2448547 -1.192278 2.478308e-01
2   B Bas -0.3241038 0.2718356 19 -0.8930623 0.2448547 -1.192278 2.478308e-01
3   A  W1  1.0133540 0.5339894 19 -0.1042986 2.1310067  1.897704 7.303645e-02
4   B  W1  3.8126100 0.5339894 19  2.6949573 4.9302626  7.139861 8.694950e-07
5   A  W2  2.9255103 0.4100293 19  2.0673091 3.7837114  7.134881 8.780771e-07
6   B  W2  5.6872341 0.4100293 19  4.8290329 6.5454353 13.870312 2.163842e-11

I think the single DoF can be "spent" provided one doesn't work with as small data as in my example, where 20 to 19 potentially could make a difference. Typically I work with a few dozens of observations, so decrease from 50 to 49 shouldn't be even noticeable. / actually, much worse problems happen in mixed models and their inference in more complex cases! Sometimes the differences in DoF reach tens... /

So - it's up to the data analyst which approach to choose, based on the context and R programming skills :)

rvlenth commented 1 year ago

Yet another way to hack around this - again with the original example and the model mod1 I fitted earlier. In this approach, I just make the linear functions the same for both baselines.

> EMM = emmeans(mod1, ~trt*tim, nesting = NULL)
Analytical Satterthwaite method not available; using appx-satterthwaite

> lf = EMM@linfct
> EMM@linfct[1, ] = EMM@linfct[2, ] = (lf[1,] + lf[2, ])/2

> EMM
 trt tim emmean    SE   df lower.CL upper.CL
 A   Bas -0.463 0.209 7.91   -0.946   0.0204
 B   Bas -0.463 0.209 7.91   -0.946   0.0204
 A   W1   1.013 0.392 8.03    0.109   1.9174
 B   W1   3.083 0.392 8.03    2.179   3.9874

Degrees-of-freedom method: appx-satterthwaite 
Confidence level used: 0.95 

If you have seriously unbalanced data, it may be better to use a weighted average of the baselines as that may come closer to what would be fitted with a model that fits a common one.