rvlenth / emmeans

Estimated marginal means
https://rvlenth.github.io/emmeans/
340 stars 30 forks source link

emmeans gives too high SEs for lme model #453

Closed certara-mtomashevskiy closed 8 months ago

certara-mtomashevskiy commented 8 months ago

Describe the bug

emmeans gives incorrect SE for lme model

To reproduce

#load the file attached
NCA.Set <- read.csv("NCA.Set.csv")
NCA.Set$ID <- as.factor(NCA.Set$ID)
NCA.Set$period <- as.factor(NCA.Set$period)
NCA.Set$sequence <- as.factor(NCA.Set$sequence)
NCA.Set$treatment <- as.factor(NCA.Set$treatment)
modelABE <-
  nlme::lme(
    logpk ~ treatment + period + sequence,
    subset = !is.na(logpk),
    random = ~ treatment | ID,
    weights = nlme::varIdent(form = ~ treatment),
    data = NCA.Set,
    method = "REML",
    na.action = na.exclude,
    control = list(
      opt = "optim",
      msMaxIter = 1000,
      msMaxEval = 1000
    ))

emmeans::emmeans(modelABE, pairwise ~ treatment)

output:

_$emmeans_
 _treatment emmean     SE df lower.CL upper.CL_
 _Reference   9.04 639998 58 -1281085  1281103_
 _Test        9.03 486891 58  -974610   974628_

_Results are averaged over the levels of: period, sequence_ 
_Degrees-of-freedom method: containment_ 
_Confidence level used: 0.95_ 

_$contrasts_
 _contrast         estimate     SE  df t.ratio p.value_
 _Reference - Test   0.0149 153106 176   0.000  1.0000_

_Results are averaged over the levels of: period, sequence_ 
_Degrees-of-freedom method: containment_

As you can see, SEs reported are too big. We tried different datasets (a lot of datasets were simulated), but only this one does not work as expected. Please note that the model is somewhat unusual, but no one did find another solution for replicate design bioequivalence model suggested by FDA (presented in SAS language). Current solution was suggested by R community.

Expected behavior

# using simplified model
modelABE <- nlme::lme(
      logpk ~ treatment + period + sequence,
      subset = !is.na(logpk),
      random = ~ 1 | ID,
      data = NCA.Set,
      method = "REML",
      na.action = na.exclude,
      control = list(
        opt = "optim",
        msMaxIter = 10000,
        msMaxEval = 10000
      )
    )
emmeans::emmeans(modelABE, pairwise ~ treatment)

output:

$emmeans
treatment emmean      SE df lower.CL upper.CL
Reference  8.998 0.01273 58    8.973    9.024
Test       8.994 0.01273 58    8.968    9.019

Results are averaged over the levels of: period, sequence 
Degrees-of-freedom method: containment 
Confidence level used: 0.95 

$contrasts
contrast         estimate      SE  df t.ratio p.value
Reference - Test  0.00435 0.00221 176   1.966  0.0509

Results are averaged over the levels of: period, sequence 
Degrees-of-freedom method: containment

you can see much more reliable SEs

rvlenth commented 8 months ago

I cannot reproduce this because I don't have the dataset.

certara-mtomashevskiy commented 8 months ago

Dear Russell,

sorry, I forgot to attach it NCA.Set.csv

rvlenth commented 8 months ago

What's not shown is the error message:

> modelABE <-
+     nlme::lme(
+         logpk ~ treatment + period + sequence,
+         subset = !is.na(logpk),
+         random = ~ treatment | ID,
+         weights = nlme::varIdent(form = ~ treatment),
+         data = NCA.Set,
+         method = "REML",
+         na.action = na.exclude,
+         control = list(
+             opt = "optim",
+             msMaxIter = 1000,
+             msMaxEval = 1000
+         ))
Error in MEestimate(lmeSt, grps) : 
  Singularity in backsolve at level 0, block 1

Therefore, I see:

> summary(modelABE)
Error: object 'modelABE' not found

... which implies that the results shown are from some other version of modelABE that was sitting around in your workspace, not the one you are claiming to reproduce.

My suggestion is to start with a clean session of R, and also to use the reprex package. You just copy the code to run into the clipboard, and run reprex::reprex().

I am closing this issue as I believe it not to be reproducible. You may reopen it if you find that you can actually reproduce the problem.

rvlenth commented 8 months ago

PS -- Please never use the same object name (in this case modelABE) for two different objects. This is one of the guidelines I request in my issue templaqte, and it is important because it causes confusion.

rvlenth commented 8 months ago

I was able to get past the error message:

> modelABE <-
+     nlme::lme(
+         logpk ~ treatment + period + sequence,
+         random = ~ treatment | ID,
+         data = NCA.Set,
+         method = "REML",
+         control = list(
+             opt = "optim",
+             msMaxIter = 1000,
+             msMaxEval = 1000
+         ))

There were no missing values, so I streamlined the code. Note this is the version without the weights.

Proceeding, ...

> emmeans(modelABE, ~treatment)
 treatment emmean     SE df lower.CL upper.CL
 Reference      9 593169 58 -1187347  1187365
 Test           9 456203 58  -913180   913198

Results are averaged over the levels of: period, sequence 
Degrees-of-freedom method: containment 
Confidence level used: 0.95 

However, look at the summary:

> summary(modelABE)
Linear mixed-effects model fit by REML
  Data: NCA.Set 
        AIC       BIC   logLik
  -1023.022 -988.4692 521.5112

Random effects:
 Formula: ~treatment | ID
 Structure: General positive-definite, Log-Cholesky parametrization
              StdDev       Corr  
(Intercept)   4.594664e+06 (Intr)
treatmentTest 1.060932e+06 -1    
Residual      1.569110e-02       

Fixed effects:  logpk ~ treatment + period + sequence 
                  Value Std.Error  DF   t-value p-value
(Intercept)    9.001221  838867.1 176 0.0000107   1.000
treatmentTest -0.004794  136965.7 176 0.0000000   1.000
period2        0.007861  136965.7 176 0.0000001   1.000
period3        0.007765  136965.7 176 0.0000001   1.000
period4        0.001195       0.0 176 0.4172733   0.677
sequence2     -0.010669 1049371.6  58 0.0000000   1.000
 Correlation: 
              (Intr) trtmnT perid2 perid3 perid4
treatmentTest -0.707                            
period2       -0.707  0.000                     
period3       -0.707  0.000  1.000              
period4        0.000  0.000  0.000  0.000       
sequence2     -0.707  0.000  1.000  1.000  0.000

Standardized Within-Group Residuals:
        Min          Q1         Med          Q3         Max 
-2.60211252 -0.60290380  0.05348208  0.56811751  2.43694264 

Number of Observations: 240
Number of Groups: 60 

Note that the SEs of the regression coefficients are huge. That's why they are also huge in the emmeans results. This is not a bug in emmeans