Potential bug in simulation with fixed covariates and parameter uncertainty

frbrz commented 3 years ago

Hi,

Thanks for this great package.

I am using RxODE to simulate a population with uncertainty in their population parameters. I am interested in the scenario where baseline covariates are fixed across simulations and I am getting some strange results, as covariates do not appear to remain constant for the same id in different sim.id.

I have attached a minimal reproducible example, based on the simulation article ( https://nlmixrdevelopment.github.io/RxODE/articles/RxODE-sim-var.html#simulation-from-inverse-wishart-correlations-1).

Side note: I think I noticed a typo on the webpage. In the Rx1 model, cl is defined as: cl <- tcl*(1+crcl.cl*(CLCR-65)) * exp(eta.v) should eta.v be replaced by eta.cl?

Back to the problem: I want to simulate 100 individuals, 10 times with the same covariates. I am using the iCov argument in the solve function. I have tried three different approaches, which I believe should give the same results (see code below for more details):

Define a data.frame with id = 1:100 and the baseline covariates. I obtain an error 'nSub'*'nStud' does not match the number of subjects in 'iCov'.
Define a data.frame with id = 1:100 and the baseline covariates and repeat it 10 times. In the solve object obtained the baseline covariate (e.g WT) changes value for each id. I would have expected the covariate to remain constant for each id.
Define a data.frame with id = 1:(100 * 10) assuming id 1, 101, 201 etc have the same covariates. The solve object obtained has very stange baseline covariate values (e.g. e-320). My impression is that all ids > 100 are somehow ignored.

Am I missing something or is this a bug?

Many thanks for the help!

library(RxODE)
#> Warning: package 'RxODE' was built under R version 4.0.5
#> RxODE 1.0.8 using 4 threads (see ?getRxThreads)
#>   no cache: create with `rxCreateCache()`
library(tidyverse)
#> Warning: package 'tidyverse' was built under R version 4.0.3
#> Warning: package 'ggplot2' was built under R version 4.0.3
#> Warning: package 'tidyr' was built under R version 4.0.3
#> Warning: package 'readr' was built under R version 4.0.3
#> Warning: package 'purrr' was built under R version 4.0.3
#> Warning: package 'stringr' was built under R version 4.0.3
#> Warning: package 'forcats' was built under R version 4.0.3

# seed
sd <- 32
set.seed(sd)

### Assume population of 100 ids to be simulated with unc 10 times (as online)
n_id <- 100
n_sim <- 10

### Define a common set of covariates for 100 ids
wt <- rnorm(100, 70, 15)
clcr <- rep(rnorm(100, 65, 25))

### Example following 
# https://nlmixrdevelopment.github.io/RxODE/articles/RxODE-sim-var.html#simulation-from-inverse-wishart-correlations-1

rx1 <- RxODE({
  test <- tcl # test to check simulated value of tcl
  cl <- tcl*(1+crcl.cl*(CLCR-65)) * exp(eta.cl) ## NOTE exp(eta.cl) is exp(eta.v) online
  v <- tv * WT * exp(eta.v)
  ka <- tka * exp(eta.ka)
  ipred <- linCmt()
  obs <- ipred * (1 + prop.sd) + add.sd 
})

theta <- c(
  tcl=2.63E+01, tv=1.35E+00, tka=4.20E+00, tlag=2.08E-01,
  prop.sd=2.05E-01, add.sd=1.06E-02, crcl.cl=7.17E-03,
  eta.cl=7.30E-02,  eta.v=3.80E-02, eta.ka=1.91E+00
)

thetaMat <- lotri(
  tcl + tv + tka + tlag + prop.sd + add.sd + crcl.cl + eta.cl + eta.v + eta.ka ~
    c(7.95E-01,
      2.05E-02, 1.92E-03,
      7.22E-02, -8.30E-03, 6.55E-01,
      -3.45E-03, -6.42E-05, 3.22E-03, 2.47E-04,
      8.71E-04, 2.53E-04, -4.71E-03, -5.79E-05, 5.04E-04,
      6.30E-04, -3.17E-06, -6.52E-04, -1.53E-05, -3.14E-05, 1.34E-05,
      -3.30E-04, 5.46E-06, -3.15E-04, 2.46E-06, 3.15E-06, -1.58E-06, 2.88E-06,
      -1.29E-03, -7.97E-05, 1.68E-03, -2.75E-05, -8.26E-05, 1.13E-05, -1.66E-06, 1.58E-04,
      -1.23E-03, -1.27E-05, -1.33E-03, -1.47E-05, -1.03E-04, 1.02E-05, 1.67E-06, 6.68E-05, 1.56E-04,
      7.69E-02, -7.23E-03, 3.74E-01, 1.79E-03, -2.85E-03, 1.18E-05, -2.54E-04, 1.61E-03, -9.03E-04, 3.12E-01))

### Option 1: only try to define 100 ids, ignoring nStud
evw <- et(amount.units="mg", time.units="hours") %>%
  et(amt=100) %>%
  et(list(c(0, 0.5),
          c(0.5, 1),
          c(1, 3),
          c(3, 6),
          c(6, 12))) %>%
  et(id=1:n_id)

icov_df <- data.frame(
  id = 1:n_id,
  WT = wt,
  CLCR = clcr
) 

sim  <- rxSolve(
  rx1, theta, evw,  nSub=n_id, nStud=n_sim,
  thetaMat=thetaMat,
  ## Match boundaries of problem
  thetaLower=0, 
  sigma=c("prop.sd", "add.sd"), ## Sigmas are standard deviations
  sigmaXform="identity", # default sigma xform="identity"
  omega=c("eta.cl", "eta.v", "eta.ka"), ## etas are variances
  omegaXform="variance", # default omega xform="variance"
  iCov = icov_df,
  keep = "WT",
  dfSub=74,
  dfObs=476,
  seed = sd
)
#> Error: 'nSub'*'nStud' does not match the number of subjects in 'iCov'
### Error: 'nSub'*'nStud' does not match the number of subjects in 'iCov' 
# Would have not expect this

### Option 2: define 100 ids (from 1:100) and repeat for each nStud
evw <- et(amount.units="mg", time.units="hours") %>%
  et(amt=100) %>%
  et(list(c(0, 0.5),
          c(0.5, 1),
          c(1, 3),
          c(3, 6),
          c(6, 12))) %>%
  et(id=1:n_id)

icov_df <- data.frame(
  id = rep(1:n_id, n_sim), 
  WT = rep(wt, n_sim),
  CLCR = rep(clcr, n_sim)
) 

sim  <- rxSolve(
  rx1, theta, evw,  nSub=n_id, nStud=n_sim,
  thetaMat=thetaMat,
  ## Match boundaries of problem
  thetaLower=0, 
  sigma=c("prop.sd", "add.sd"), ## Sigmas are standard deviations
  sigmaXform="identity", # default sigma xform="identity"
  omega=c("eta.cl", "eta.v", "eta.ka"), ## etas are variances
  omegaXform="variance", # default omega xform="variance"
  iCov = icov_df,
  keep = "WT",
  dfSub=74,
  dfObs=476,
  seed = sd
)

as.data.frame(sim) %>% 
  filter(id == 1) %>% 
  count(WT)
#>          WT n
#> 1  53.19633 5
#> 2  54.73235 5
#> 3  62.07877 5
#> 4  65.63664 5
#> 5  70.21961 5
#> 6  71.20245 5
#> 7  75.62491 5
#> 8  77.12880 5
#> 9  82.30638 5
#> 10 90.40214 5
### Error?
# For id == 1 I would have expected to have WT equal to wt[1] (i.e. 70.21961)
# Also: note that ipred and obs have many NA associated with predicted values

### Option 3: define 1000 ids (i.e. n_sim * n_std) 
# so that ids 1, 101, 201, ... have same covariates
evw <- et(amount.units="mg", time.units="hours") %>%
  et(amt=100) %>%
  et(list(c(0, 0.5),
          c(0.5, 1),
          c(1, 3),
          c(3, 6),
          c(6, 12))) %>%
  et(id=1:(n_id * n_sim))

icov_df <- data.frame(
  id = 1:(n_id * n_sim), 
  WT = rep(wt, n_sim),
  CLCR = rep(clcr, n_sim)
) 

sim  <- rxSolve(
  rx1, theta, evw,  nSub=n_id, nStud=n_sim,
  thetaMat=thetaMat,
  ## Match boundaries of problem
  thetaLower=0, 
  sigma=c("prop.sd", "add.sd"), ## Sigmas are standard deviations
  sigmaXform="identity", # default sigma xform="identity"
  omega=c("eta.cl", "eta.v", "eta.ka"), ## etas are variances
  omegaXform="variance", # default omega xform="variance"
  iCov = icov_df,
  keep = "WT",
  dfSub=74,
  dfObs=476,
  seed = sd
)

as.data.frame(sim) %>% 
  filter(id == 1) %>% 
  count(WT)
#>               WT n
#> 1   0.000000e+00 5
#> 2  3.981675e-320 5
#> 3  1.645544e-315 5
#> 4  2.037116e-311 5
#> 5  6.281108e-311 5
#> 6  1.052510e-310 5
#> 7  1.476909e-310 5
#> 8  1.901308e-310 5
#> 9   4.799493e+01 5
#> 10  7.021961e+01 5

as.data.frame(sim) %>% 
  filter(id == 1, sim.id == 2)
#>   sim.id id      time     test        cl             v       ka ipred obs
#> 1      2  1 0.2126361 26.54149 -6100.595 1.290104e-319 5.917134    NA  NA
#> 2      2  1 0.5804724 26.54149 -6100.595 1.290104e-319 5.917134    NA  NA
#> 3      2  1 2.5591191 26.54149 -6100.595 1.290104e-319 5.917134    NA  NA
#> 4      2  1 3.3965854 26.54149 -6100.595 1.290104e-319 5.917134    NA  NA
#> 5      2  1 7.3811711 26.54149 -6100.595 1.290104e-319 5.917134    NA  NA
#>              WT
#> 1 3.981675e-320
#> 2 3.981675e-320
#> 3 3.981675e-320
#> 4 3.981675e-320
#> 5 3.981675e-320
### Error: WT should always be equal to wt[1] (i.e. 70.21961) for id == 1
# but this is not the case. Also loads of NA

sessionInfo()
#> R version 4.0.2 (2020-06-22)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 18363)
#> 
#> Matrix products: default
#> 
#> locale:
#> [1] LC_COLLATE=English_United States.1252 
#> [2] LC_CTYPE=English_United States.1252   
#> [3] LC_MONETARY=English_United States.1252
#> [4] LC_NUMERIC=C                          
#> [5] LC_TIME=English_United States.1252    
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices datasets  utils     methods   base     
#> 
#> other attached packages:
#>  [1] forcats_0.5.0   stringr_1.4.0   dplyr_1.0.3     purrr_0.3.4    
#>  [5] readr_1.4.0     tidyr_1.1.2     tibble_3.0.5    ggplot2_3.3.3  
#>  [9] tidyverse_1.3.0 RxODE_1.0.8     reprex_0.3.0   
#> 
#> loaded via a namespace (and not attached):
#>  [1] Rcpp_1.0.6          lubridate_1.7.9.2   assertthat_0.2.1   
#>  [4] digest_0.6.27       cellranger_1.1.0    R6_2.5.0           
#>  [7] backports_1.2.1     sys_3.4             evaluate_0.14      
#> [10] httr_1.4.2          highr_0.8           pillar_1.4.7       
#> [13] rlang_0.4.10        readxl_1.3.1        data.table_1.13.6  
#> [16] checkmate_2.0.0     rmarkdown_2.6       qs_0.24.1          
#> [19] dparser_1.3.1-4     PreciseSums_0.4     munsell_0.5.0      
#> [22] broom_0.7.3         compiler_4.0.2      modelr_0.1.8       
#> [25] xfun_0.20           pkgconfig_2.0.3     htmltools_0.5.1    
#> [28] tidyselect_1.1.0    fansi_0.4.2         withr_2.4.0        
#> [31] crayon_1.3.4        dbplyr_2.0.0        grid_4.0.2         
#> [34] jsonlite_1.7.2      gtable_0.3.0        lifecycle_0.2.0    
#> [37] DBI_1.1.1           magrittr_2.0.1      scales_1.1.1       
#> [40] RcppParallel_5.1.2  cli_2.2.0           stringi_1.5.3      
#> [43] cachem_1.0.4        renv_0.12.5         fs_1.5.0           
#> [46] xml2_1.3.2          ellipsis_0.3.1      generics_0.1.0     
#> [49] vctrs_0.3.6         stringfish_0.15.1   lotri_0.3.1        
#> [52] RApiSerialize_0.1.0 tools_4.0.2         glue_1.4.2         
#> [55] hms_1.0.0           fastmap_1.1.0       yaml_2.2.1         
#> [58] colorspace_2.0-0    rvest_0.3.6         memoise_2.0.0      
#> [61] knitr_1.30          haven_2.3.1

^{Created on 2021-04-20 by the reprex package (v0.3.0)}

mattfidler commented 3 years ago

Thank you for reporting this; I will look into it further in the next coming days.

If you are interested in maintaining the covariates in the proper order, I suggest merging them to the event data-set. If I get a chance I will provide an example for that as well.

frbrz commented 3 years ago

Hi Matthew,

Thanks for your reply and looking at this so quickly.

Well for sim.id == 1 and id == 1 it is 70.21961

Indeed you are right about this and as you suggest I could merge with the original dataset to get the original ids.

However I think that there may be something wrong with the id /sim.id columns and their relationship with keep. Using the code you have sent me (with WT2) if you restrict the simulation results to sim.id == 1 you get the following WT2:

as.data.frame(sim) %>% filter(sim.id == 1) %>% count(WT2) WT2 n 1 54.58081 50 2 60.63536 50 3 70.21961 50 4 74.27097 50 5 74.66919 50 6 76.10526 50 7 76.74155 50 8 80.28497 50 9 82.59484 50 10 83.09933 50

I am not sure I understand correctly the difference between WT and WT2 as in they have exactly the same values. See below: all(sim$WT == sim$WT2) TRUE

Only 10 different WT2s are generated for sim.id == 1 (and all other sim.ids), given that I specified nSub = 100, I would expect to see 100 different WT2, each with n = 5 (irrespective of the id label). What is interesting is that when I look at the id for sim.id == 1 I have exactly 100 of them (expected behaviour). So I believe that there is something wrong with the way baseline covariates are linked to id.

as.data.frame(sim) %>% filter(sim.id == 1) %>% count(id) id n 1 1 5 2 2 5 3 3 5 4 4 5 5 5 5 6 6 5 7 7 5 8 8 5 9 9 5 10 10 5 11 11 5 12 12 5 13 13 5 14 14 5 15 15 5 16 16 5 17 17 5 18 18 5 19 19 5 20 20 5 21 21 5 22 22 5 23 23 5 24 24 5 25 25 5 26 26 5 27 27 5 28 28 5 29 29 5 30 30 5 31 31 5 32 32 5 33 33 5 34 34 5 35 35 5 36 36 5 37 37 5 38 38 5 39 39 5 40 40 5 41 41 5 42 42 5 43 43 5 44 44 5 45 45 5 46 46 5 47 47 5 48 48 5 49 49 5 50 50 5 51 51 5 52 52 5 53 53 5 54 54 5 55 55 5 56 56 5 57 57 5 58 58 5 59 59 5 60 60 5 61 61 5 62 62 5 63 63 5 64 64 5 65 65 5 66 66 5 67 67 5 68 68 5 69 69 5 70 70 5 71 71 5 72 72 5 73 73 5 74 74 5 75 75 5 76 76 5 77 77 5 78 78 5 79 79 5 80 80 5 81 81 5 82 82 5 83 83 5 84 84 5 85 85 5 86 86 5 87 87 5 88 88 5 89 89 5 90 90 5 91 91 5 92 92 5 93 93 5 94 94 5 95 95 5 96 96 5 97 97 5 98 98 5 99 99 5 100 100 5

Thanks again for your help, best Francesco

On Wed, Apr 21, 2021 at 6:50 AM Matthew Fidler @.***> wrote:

In your comments you said:

Error? For id == 1 I would have expected to have WT equal to wt[1] (i.e. 70.21961)

Well for sim.id == 1 and id == 1 it is 70.21961

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/nlmixrdevelopment/RxODE/issues/408#issuecomment-823773482, or unsubscribe https://github.com/notifications/unsubscribe-auth/ATYNZ6A44KPM3XO5EJGSXTLTJZKQ3ANCNFSM43IO44AA .

mattfidler commented 3 years ago

I changed the iCov as an easy way to merge the individual covariates to the input dataset. It simplifies the code.

While it does something a bit different from the original intent, I think that simulation with uncertainty by resampling the original covariates by resample=TRUE is a bit better than what the original intent of how iCov was made to work.

mattfidler commented 3 years ago

I updated the vignette to show the simulation that you discuss here; Once updated it will show the new iCov behavior here:

https://nlmixrdevelopment.github.io/RxODE/articles/RxODE-sim-var.html#simulate-using-variancestandard-deviation-standard-errors

Let me know if this new behavior makes sense to you.

frbrz commented 3 years ago

I had a look at the updated vignette commit and it does make sense to me. Next week I will upgrade to the development version and can test the patch.

Thanks again for the speedy fix!

nlmixrdevelopment / RxODE

Potential bug in simulation with fixed covariates and parameter uncertainty #408