emmeans result not correct when log-transformed covariate is used

lang-benjamin commented 1 month ago

When doing some transformation (e.g. log) as part of the formula specification, a subsequent emmeans call does not provide the correct results. I am using the most up to date development version of mmrm and the newest CRAN release from emmeans.

Consider the following example:

library(mmrm)
library(emmeans)
#> mmrm() registered as emmeans extension
#> Welcome to emmeans.
#> Caution: You lose important information if you filter this package's results.
#> See '? untidy'

fit1 <- mmrm(
  formula = FEV1 ~ log(FEV1_BL) + RACE + SEX + ARMCD * AVISIT + us(AVISIT | USUBJID),
  data = fev_data
)

fit2 <- mmrm(
  formula = FEV1 ~ log_FEV1_BL + RACE + SEX + ARMCD * AVISIT + us(AVISIT | USUBJID),
  data = within(fev_data, log_FEV1_BL <- log(FEV1_BL))
)
# fit1 and fit2 yield the same results, but emmeans applied to them does not
emms1 <- emmeans(fit1, ~ ARMCD | AVISIT) |>
  summary()

emms2 <- emmeans(fit2, ~ ARMCD | AVISIT) |>
  summary()

# For example:
emms1[1, "emmean"]
#> [1] 33.40042
emms2[1, "emmean"]
#> [1] 33.23374
all.equal(emms1[1, "emmean"], emms2[1, "emmean"])
#> [1] "Mean relative difference: 0.004990338"

# sessionInfo
sessionInfo()
#> R version 4.2.2 (2022-10-31 ucrt)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 19045)
#> 
#> Matrix products: default
#> 
#> locale:
#> [1] LC_COLLATE=German_Germany.utf8  LC_CTYPE=German_Germany.utf8   
#> [3] LC_MONETARY=German_Germany.utf8 LC_NUMERIC=C                   
#> [5] LC_TIME=German_Germany.utf8    
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] emmeans_1.10.3   mmrm_0.3.12.9000
#> 
#> loaded via a namespace (and not attached):
#>  [1] Rcpp_1.0.12        TMB_1.9.10         pillar_1.9.0       compiler_4.2.2    
#>  [5] R.methodsS3_1.8.2  R.utils_2.12.3     tools_4.2.2        digest_0.6.34     
#>  [9] evaluate_0.23      lifecycle_1.0.4    tibble_3.2.1       checkmate_2.3.1   
#> [13] nlme_3.1-160       R.cache_0.16.0     lattice_0.20-45    pkgconfig_2.0.3   
#> [17] rlang_1.1.2        reprex_2.1.0       Matrix_1.6-5       cli_3.6.2         
#> [21] rstudioapi_0.15.0  yaml_2.3.8         parallel_4.2.2     mvtnorm_1.2-4     
#> [25] xfun_0.42          fastmap_1.1.1      withr_3.0.0        styler_1.10.2     
#> [29] stringr_1.5.1      knitr_1.45         generics_0.1.3     fs_1.6.3          
#> [33] vctrs_0.6.5        grid_4.2.2         glue_1.7.0         fansi_1.0.6       
#> [37] Rdpack_2.6         survival_3.4-0     rmarkdown_2.26     multcomp_1.4-25   
#> [41] TH.data_1.1-2      purrr_1.0.2        magrittr_2.0.3     codetools_0.2-18  
#> [45] MASS_7.3-58.1      splines_4.2.2      backports_1.4.1    htmltools_0.5.7   
#> [49] rbibutils_2.2.16   xtable_1.8-4       sandwich_3.1-0     utf8_1.2.4        
#> [53] estimability_1.4.1 stringi_1.8.3      zoo_1.8-12         R.oo_1.26.0

^{Created on 2024-07-11 with reprex v2.1.0}

clarkliming commented 1 month ago

thank you @lang-benjamin fore reporting this, this is for sure high priority and we will fix soon!

clarkliming commented 1 month ago

I am not sure if this is a bug or feature but let me try to make this clear

emmeans will try to average the covariates
for categorical covariates we just list all combinations of them
however for numerical values we can use the mean of the observations

so when we have a transformed value in the covariates, how we should decide the mean of that covariate, do we use mean of untransformed, or mean of transformed?

currently in emmeans the mean of untransformed value is always used. this is also true for glm and other fits

examples see below

library(emmeans)
d <- data.frame(
  y = rbinom(100, 1, 0.5),
  x = rnorm(100) + 100
)
d$x2 <- log(d$x)
a <- glm(y ~ log(x), data = d)
b <- glm(y ~ x2, data = d)

ref_grid(a, ~ 1)
ref_grid(b, ~ 1)

there will be minor differences

danielinteractive commented 1 month ago

I guess this could be a emmeans question (and hopefully answer somewhere there) of how to make it work?

yonicd commented 1 month ago

@wlandau we just went through these types of questions for emmeans in brms.mmrm, do you recall anything like this?

yonicd commented 1 month ago

would regrid be the part that is missing here?

yonicd commented 1 month ago

Sorry: this is the relevant section for this question

yonicd commented 1 month ago

library(mmrm)
library(emmeans)
#> mmrm() registered as emmeans extension
#> Welcome to emmeans.
#> Caution: You lose important information if you filter this package's results.
#> See '? untidy'

library(broom)

#> mmrm() registered as emmeans extension
#> Welcome to emmeans.
#> Caution: You lose important information if you filter this package's results.
#> See '? untidy'

fit1 <- mmrm(
  formula = FEV1 ~ log(FEV1_BL) + RACE + SEX + ARMCD * AVISIT + us(AVISIT | USUBJID),
  data = fev_data
)

fit2 <- mmrm(
  formula = FEV1 ~ log_FEV1_BL + RACE + SEX + ARMCD * AVISIT + us(AVISIT | USUBJID),
  data = within(fev_data, log_FEV1_BL <- log(FEV1_BL))
)

ref_grid(fit1)
#> 'emmGrid' object with variables:
#>     FEV1_BL = 40.191
#>     RACE = Asian, Black or African American, White
#>     SEX = Male, Female
#>     ARMCD = PBO, TRT
#>     AVISIT = VIS1, VIS2, VIS3, VIS4

ref_grid(fit2)
#> 'emmGrid' object with variables:
#>     log_FEV1_BL = 3.667
#>     RACE = Asian, Black or African American, White
#>     SEX = Male, Female
#>     ARMCD = PBO, TRT
#>     AVISIT = VIS1, VIS2, VIS3, VIS4


emmeans(fit1, ~ ARMCD | AVISIT,at = list(FEV1_BL = exp(3.667))) |>
  tidy()
#> # A tibble: 8 × 7
#>   ARMCD AVISIT estimate std.error    df statistic   p.value
#>   <chr> <chr>     <dbl>     <dbl> <dbl>     <dbl>     <dbl>
#> 1 PBO   VIS1       33.2     0.734  144.      45.3 4.72e- 87
#> 2 TRT   VIS1       37.3     0.741  140.      50.3 1.64e- 91
#> 3 PBO   VIS2       38.0     0.579  144.      65.7 3.34e-109
#> 4 TRT   VIS2       42.0     0.569  141.      73.8 1.61e-114
#> 5 PBO   VIS3       43.6     0.442  129.      98.8 2.23e-123
#> 6 TRT   VIS3       46.6     0.488  130.      95.3 4.70e-122
#> 7 PBO   VIS4       48.4     1.17   133.      41.3 6.28e- 78
#> 8 TRT   VIS4       52.9     1.17   132.      45.2 3.03e- 82


emmeans(fit2, ~ ARMCD | AVISIT) |>
  tidy()
#> # A tibble: 8 × 7
#>   ARMCD AVISIT estimate std.error    df statistic   p.value
#>   <chr> <chr>     <dbl>     <dbl> <dbl>     <dbl>     <dbl>
#> 1 PBO   VIS1       33.2     0.734  144.      45.3 4.70e- 87
#> 2 TRT   VIS1       37.3     0.741  140.      50.3 1.63e- 91
#> 3 PBO   VIS2       38.0     0.579  144.      65.7 3.35e-109
#> 4 TRT   VIS2       42.0     0.569  141.      73.8 1.61e-114
#> 5 PBO   VIS3       43.6     0.442  129.      98.8 2.23e-123
#> 6 TRT   VIS3       46.6     0.488  130.      95.3 4.71e-122
#> 7 PBO   VIS4       48.4     1.17   133.      41.3 6.28e- 78
#> 8 TRT   VIS4       52.9     1.17   132.      45.2 3.03e- 82

^{Created on 2024-07-12 by the reprex package (v2.0.1)}

lang-benjamin commented 1 month ago

I am not sure if this is a bug or feature [...]

so when we have a transformed value in the covariates, how we should decide the mean of that covariate, do we use mean of untransformed, or mean of transformed?

currently in emmeans the mean of untransformed value is always used. this is also true for glm and other fits

Thank you! If one opts for having a log-transformation of a covariate (e.g. to enable a relationship to the response that is approximately linear), then I would say that the mean should be considered from this transformed scale. I assumed that emmeans itself would calculate the mean of log-transformed values once the model specification includes log(covariate). It seems that this is not the case, though. So, I guess this is more a question to emmeans itself and not directly to mmrm.

danielinteractive commented 1 month ago

Yes, I think it sounds like this is more a question to the emmeans team

clarkliming commented 6 days ago

I think we can close this issue as this is not really mmrm related

danielinteractive commented 6 days ago

closing as it is a question for emmeans

openpharma / mmrm

emmeans result not correct when log-transformed covariate is used #452