openpharma / mmrm

Mixed Models for Repeated Measures (MMRM) in R.
https://openpharma.github.io/mmrm/
Other
127 stars 22 forks source link

Reproducibility issue: There's a discrepancy in coefficient estimates when altering the variable order within the formula. #439

Closed sstats2024 closed 1 month ago

sstats2024 commented 5 months ago

Summary

I tried to fit the following mmrm model, an example I received on Stack Exchange. However, I encountered a bug when I swapped the order of the age and eye_visit variables. It seems a previous post has identified a reproducibility issue, but I am only getting the issue when i swap the order of the variables.


library(tidyverse)
library(mmrm)

set.seed(42)
example <- tibble(eye=rep(c("left", "right"), 30),
                  visit=rep(rep(c("Week0", "Week4", "Week8"), each=2), 10),
                  id=rep(1L:10L, each=6),
                  random_subject_effect = rnorm(n=10, mean=0, sd=10)[id],
                  age = sample(18:85, size=10, replace=T)[id],
                  IOP = rnorm(n=60, 
                              mean=random_subject_effect+10-(eye=="left")*3+as.integer(str_extract(visit, "[0-8]+")), 
                              sd=5)) %>%
  dplyr::select(-random_subject_effect) %>%
  mutate(eye_visit = factor(paste0(eye, "_", visit)),
         id=factor(id))

mmrmfit1 <- mmrm::mmrm(
  formula = IOP ~ 0 + eye_visit + age + us(eye_visit | id),
  data = example,
  method = "Kenward-Roger")

mmrmfit1

mmrmfit1 %>%
  summary()

mmrmfit2 <- mmrm::mmrm(
  formula = IOP ~ 0 + age + eye_visit  + us(eye_visit | id),
  data = example,
  method = "Kenward-Roger")

mmrmfit2

mmrmfit2 %>%
  summary()

This is my output:


mmrmfit1

mmrm fit

Formula:     IOP ~ 0 + eye_visit + age + us(eye_visit | id)
Data:        example (used 60 observations from 10 subjects with maximum 6 timepoints)
Covariance:  unstructured (21 variance parameters)
Inference:   REML
Deviance:    342.8828

Coefficients: 
 eye_visitleft_Week0  eye_visitleft_Week4  eye_visitleft_Week8 eye_visitright_Week0 eye_visitright_Week4 
          27.2300956           29.2064058           35.7663717           26.6798864           33.3782161 
eye_visitright_Week8                  age 
          36.9495174           -0.3037067 

Model Inference Optimization:
Converged with code 0 and message: convergence: rel_reduction_of_f <= factr*epsmch

mmrmfit1 %>%
summary()

mmrm fit

Formula:     IOP ~ 0 + eye_visit + age + us(eye_visit | id)
Data:        example (used 60 observations from 10 subjects with maximum 6 timepoints)
Covariance:  unstructured (21 variance parameters)
Method:      Kenward-Roger
Vcov Method: Kenward-Roger
Inference:   REML

Model selection criteria:
     AIC      BIC   logLik deviance 
   384.9    391.2   -171.4    342.9 

Coefficients: 
                     Estimate Std. Error      df t value Pr(>|t|)    
eye_visitleft_Week0   27.2301     5.6528 12.7290   4.817 0.000357 ***
eye_visitleft_Week4   29.2064     6.3341 14.6790   4.611 0.000358 ***
eye_visitleft_Week8   35.7664     5.6209 12.3470   6.363 3.15e-05 ***
eye_visitright_Week0  26.6799     5.3905 10.8100   4.949 0.000459 ***
eye_visitright_Week4  33.3782     5.3272 11.0120   6.266 6.10e-05 ***
eye_visitright_Week8  36.9495     6.0489 14.4010   6.108 2.39e-05 ***
age                   -0.3037     0.1099  8.0020  -2.763 0.024559 *  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Covariance estimate:
            left_Week0 left_Week4 left_Week8 right_Week0 right_Week4 right_Week8
left_Week0     93.3227    99.0375    79.6347     61.8261     44.6053     79.4946
left_Week4     99.0375   181.7674    80.4110     79.5096     62.6905    119.0725
left_Week8     79.6347    80.4110    84.4629     63.2502     53.7663     84.7796
right_Week0    61.8261    79.5096    63.2502     60.0874     38.2846     90.2006
right_Week4    44.6053    62.6905    53.7663     38.2846     61.4984     46.9245
right_Week8    79.4946   119.0725    84.7796     90.2006     46.9245    158.6355

mmrmfit2

mmrm fit

Formula:     IOP ~ 0 + age + eye_visit + us(eye_visit | id)
Data:        example (used 60 observations from 10 subjects with maximum 6 timepoints)
Covariance:  unstructured (21 variance parameters)
Inference:   REML
Deviance:    342.8828

Coefficients: 
                 age  eye_visitleft_Week0  eye_visitleft_Week4  eye_visitleft_Week8 eye_visitright_Week0 
          -0.3037263           27.2309662           29.2072765           35.7672424           26.6807571 
eye_visitright_Week4 eye_visitright_Week8 
          33.3790868           36.9503881 

Model Inference Optimization:
Converged with code 0 and message: convergence: rel_reduction_of_f <= factr*epsmch

mmrmfit2 %>%
summary()

mmrm fit

Formula:     IOP ~ 0 + age + eye_visit + us(eye_visit | id)
Data:        example (used 60 observations from 10 subjects with maximum 6 timepoints)
Covariance:  unstructured (21 variance parameters)
Method:      Kenward-Roger
Vcov Method: Kenward-Roger
Inference:   REML

Model selection criteria:
     AIC      BIC   logLik deviance 
   384.9    391.2   -171.4    342.9 

Coefficients: 
                     Estimate Std. Error      df t value Pr(>|t|)    
age                   -0.3037     0.1099  7.9990  -2.763 0.024579 *  
eye_visitleft_Week0   27.2310     5.6537 12.7240   4.817 0.000358 ***
eye_visitleft_Week4   29.2073     6.3348 14.6750   4.611 0.000359 ***
eye_visitleft_Week8   35.7672     5.6217 12.3420   6.362 3.16e-05 ***
eye_visitright_Week0  26.6808     5.3914 10.8060   4.949 0.000460 ***
eye_visitright_Week4  33.3791     5.3280 11.0080   6.265 6.11e-05 ***
eye_visitright_Week8  36.9504     6.0496 14.3990   6.108 2.39e-05 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Covariance estimate:
            left_Week0 left_Week4 left_Week8 right_Week0 right_Week4 right_Week8
left_Week0     93.3347    99.0491    79.6480     61.8398     44.6133     79.5073
left_Week4     99.0491   181.7815    80.4265     79.5243     62.7014    119.0896
left_Week8     79.6480    80.4265    84.4775     63.2637     53.7756     84.7934
right_Week0    61.8398    79.5243    63.2637     60.0996     38.2922     90.2129
right_Week4    44.6133    62.7014    53.7756     38.2922     61.5051     46.9348
right_Week8    79.5073   119.0896    84.7934     90.2129     46.9348    158.6437

R session info

R version 4.1.2 (2021-11-01)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 22631)

Matrix products: default

locale:
[1] LC_COLLATE=English_United Kingdom.1252  LC_CTYPE=English_United Kingdom.1252    LC_MONETARY=English_United Kingdom.1252
[4] LC_NUMERIC=C                            LC_TIME=English_United Kingdom.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] forcats_1.0.0   stringr_1.5.1   dplyr_1.1.0     purrr_1.0.1     readr_2.1.2     tidyr_1.2.0     tibble_3.1.6   
 [8] ggplot2_3.4.2   tidyverse_1.3.2 mmrm_0.3.11    

loaded via a namespace (and not attached):
 [1] tidyselect_1.2.0    TMB_1.9.10          haven_2.4.3         lattice_0.20-45     gargle_1.2.0       
 [6] colorspace_2.1-0    vctrs_0.5.2         generics_0.1.3      utf8_1.2.2          rlang_1.1.0        
[11] pillar_1.9.0        glue_1.6.1          withr_2.5.2         DBI_1.1.2           dbplyr_2.3.0       
[16] modelr_0.1.11       readxl_1.3.1        lifecycle_1.0.4     munsell_0.5.0       gtable_0.3.4       
[21] cellranger_1.1.0    rvest_1.0.2         tzdb_0.2.0          parallel_4.1.2      fansi_1.0.2        
[26] broom_1.0.5         Rcpp_1.0.10         backports_1.4.1     scales_1.3.0        googlesheets4_1.0.0
[31] checkmate_2.1.0     jsonlite_1.7.3      fs_1.5.2            hms_1.1.3           stringi_1.7.6      
[36] rbibutils_2.2.13    grid_4.1.2          Rdpack_2.6          cli_3.6.0           tools_4.1.2        
[41] magrittr_2.0.2      crayon_1.5.2        pkgconfig_2.0.3     ellipsis_0.3.2      Matrix_1.5-4       
[46] xml2_1.3.3          reprex_2.0.1        googledrive_2.0.0   lubridate_1.8.0     assertthat_0.2.1   
[51] httr_1.4.2          rstudioapi_0.16.0   R6_2.5.1            nlme_3.1-153        compiler_4.1.2  

OS / Environment

danielinteractive commented 5 months ago

Thanks @sstats2024 , can you please include the output you get?

danielinteractive commented 5 months ago

Also please try to install the latest CRAN release and see if the problem persists

sstats2024 commented 5 months ago

Thanks @danielinteractive. I've included the output above. Even after installing the latest version of mmrm, I'm still getting the same output as before.

danielinteractive commented 5 months ago

Hi again @sstats2024 ,

so I ran this on my Macbook, and also get slightly different coefficient estimates each time:

> coef(mmrmfit1)
 eye_visitleft_Week0  eye_visitleft_Week4  eye_visitleft_Week8 eye_visitright_Week0 
          27.2299828           29.2062931           35.7662590           26.6797737 
eye_visitright_Week4 eye_visitright_Week8                  age 
          33.3781034           36.9494047           -0.3037041 
> coef(mmrmfit2)
                 age  eye_visitleft_Week0  eye_visitleft_Week4  eye_visitleft_Week8 
          -0.3037215           27.2307519           29.2070622           35.7670281 
eye_visitright_Week0 eye_visitright_Week4 eye_visitright_Week8 
          26.6805428           33.3788725           36.9501738 
> c1 <- coef(mmrmfit1)
> c2 <- coef(mmrmfit2)
> setequal(names(c1), names(c2))
[1] TRUE
> c1 - c2[names(c1)]
 eye_visitleft_Week0  eye_visitleft_Week4  eye_visitleft_Week8 eye_visitright_Week0 
       -7.691045e-04        -7.691045e-04        -7.691045e-04        -7.691045e-04 
eye_visitright_Week4 eye_visitright_Week8                  age 
       -7.691045e-04        -7.691045e-04         1.736128e-05 
> diff_coefs <- c1 - c2[names(c1)]
> diff_coefs
 eye_visitleft_Week0  eye_visitleft_Week4  eye_visitleft_Week8 eye_visitright_Week0 
       -7.691045e-04        -7.691045e-04        -7.691045e-04        -7.691045e-04 
eye_visitright_Week4 eye_visitright_Week8                  age 
       -7.691045e-04        -7.691045e-04         1.736128e-05 
> rel_diff_coefs <- abs(diff_coefs / c1)
> rel_diff_coefs
 eye_visitleft_Week0  eye_visitleft_Week4  eye_visitleft_Week8 eye_visitright_Week0 
        2.824477e-05         2.633352e-05         2.150363e-05         2.882725e-05 
eye_visitright_Week4 eye_visitright_Week8                  age 
        2.304219e-05         2.081507e-05         5.716510e-05 
> all(rel_diff_coefs < 1e-3)
[1] TRUE

However, this is completely fine, as both the absolute differences as well as the relative differences are very small. Since we use floating point calculations in the computer, the ordering of the design matrix columns, induced here by the ordering of the covariates in the formula, will slightly influence the results. But these differences don't matter in practice.

You will also see a similar behavior with other modeling packages or methodologies.

If you would like to avoid this, you would have to sort the covariates yourself before putting them in the formula e.g.

Does this solve your question?

danielinteractive commented 1 month ago

Closing because answered and no further replies.