sambrilleman / simsurv

Simulate Survival Data
GNU General Public License v3.0
22 stars 8 forks source link

Including more than one causal covariate #12

Closed sahirbhatnagar closed 5 years ago

sahirbhatnagar commented 5 years ago

When I try including more than one causal covariate, all simulated patients have status=1. I've tried several combinations of different betas for bmi and sex. Do you know what the issue could be?

pacman::p_load(simsurv) # to simulate data
pacman::p_load(truncnorm) # to simulate truncated normal BMI
set.seed(20190730)
samplesize <- 1000
covs <- data.frame(id = 1:samplesize,
                   trt = rbinom(samplesize, 1, 0.5),
                   bmi = truncnorm::rtruncnorm(samplesize, a = 15, b = 30, mean = 23, sd = 2),
                   sex = rbinom(samplesize, 1, 0.6))
simdat <- simsurv(dist = "weibull",
                  lambdas = 0.1,
                  gammas = 1.5,
                  betas = c(trt = -0.5, bmi = 0.1, sex = -0.1),
                  x = covs,
                  tde = c(trt = 0.15),
                  tdefunction = "log", 
                  maxt = 5,
                  seed = 20190730)
table(simdat$status)
#> 
#>    1 
#> 1000

Created on 2019-07-30 by the reprex package (v0.2.1)

Session info ``` r devtools::session_info() #> ─ Session info ────────────────────────────────────────────────────────── #> setting value #> version R version 3.6.1 (2019-07-05) #> os Pop!_OS 18.10 #> system x86_64, linux-gnu #> ui X11 #> language en_CA:en #> collate en_CA.UTF-8 #> ctype en_CA.UTF-8 #> tz America/Toronto #> date 2019-07-30 #> #> ─ Packages ────────────────────────────────────────────────────────────── #> package * version date lib source #> assertthat 0.2.1 2019-03-21 [1] CRAN (R 3.5.1) #> backports 1.1.4 2019-04-10 [1] CRAN (R 3.6.0) #> callr 3.3.1 2019-07-18 [1] CRAN (R 3.6.1) #> cli 1.1.0 2019-03-19 [1] CRAN (R 3.5.1) #> crayon 1.3.4 2017-09-16 [1] CRAN (R 3.5.1) #> desc 1.2.0 2018-05-01 [1] CRAN (R 3.5.1) #> devtools 2.0.1 2018-10-26 [1] CRAN (R 3.5.1) #> digest 0.6.20 2019-07-04 [1] CRAN (R 3.6.0) #> evaluate 0.14 2019-05-28 [1] CRAN (R 3.6.0) #> fs 1.3.1 2019-05-06 [1] CRAN (R 3.6.0) #> glue 1.3.1 2019-03-12 [1] CRAN (R 3.5.1) #> highr 0.8 2019-03-20 [1] CRAN (R 3.5.1) #> htmltools 0.3.6 2017-04-28 [1] CRAN (R 3.5.1) #> knitr 1.23 2019-05-18 [1] CRAN (R 3.6.0) #> magrittr 1.5 2014-11-22 [1] CRAN (R 3.5.1) #> memoise 1.1.0 2017-04-21 [1] CRAN (R 3.5.1) #> pacman 0.5.0 2018-10-22 [1] CRAN (R 3.5.1) #> pkgbuild 1.0.3 2019-03-20 [1] CRAN (R 3.6.0) #> pkgload 1.0.2 2018-10-29 [1] CRAN (R 3.5.1) #> prettyunits 1.0.2 2015-07-13 [1] CRAN (R 3.5.1) #> processx 3.4.1 2019-07-18 [1] CRAN (R 3.6.1) #> ps 1.3.0 2018-12-21 [1] CRAN (R 3.5.1) #> R6 2.4.0 2019-02-14 [1] CRAN (R 3.5.1) #> Rcpp 1.0.2 2019-07-25 [1] CRAN (R 3.6.1) #> remotes 2.0.2 2018-10-30 [1] CRAN (R 3.5.1) #> rlang 0.4.0 2019-06-25 [1] CRAN (R 3.6.0) #> rmarkdown 1.14 2019-07-12 [1] CRAN (R 3.6.0) #> rprojroot 1.3-2 2018-01-03 [1] CRAN (R 3.5.1) #> sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 3.5.1) #> simsurv * 0.2.3 2019-02-01 [1] CRAN (R 3.6.1) #> stringi 1.4.3 2019-03-12 [1] CRAN (R 3.5.1) #> stringr 1.4.0 2019-02-10 [1] CRAN (R 3.5.1) #> testthat 2.2.1 2019-07-25 [1] CRAN (R 3.6.1) #> truncnorm * 1.0-8 2018-02-27 [1] CRAN (R 3.5.1) #> usethis 1.5.0.9000 2019-06-24 [1] Github (r-lib/usethis@d4eabb9) #> withr 2.1.2 2018-03-15 [1] CRAN (R 3.5.1) #> xfun 0.8 2019-06-25 [1] CRAN (R 3.6.0) #> yaml 2.2.0 2018-07-25 [1] CRAN (R 3.5.1) #> #> [1] /home/sahir/R/x86_64-pc-linux-gnu-library/3.5 #> [2] /home/sahir/R/x86_64-pc-linux-gnu-library/3.6 #> [3] /usr/local/lib/R/site-library #> [4] /usr/lib/R/site-library #> [5] /usr/lib/R/library ```
sambrilleman commented 5 years ago

They all have event times < 5 (the administrative censoring time you've specified in the maxt argument).

You can see the simulated event times by plotting them:

hist(simdat$eventtime)

The reason they all have "short" event times is because the hazard is "large". This perhaps isn't because of the covariates, but rather because of the baseline hazard. For example, a smaller Weibull lambda would mean a smaller baseline hazard rate and hence better survival...

pacman::p_load(simsurv) # to simulate data
pacman::p_load(truncnorm) # to simulate truncated normal BMI
set.seed(20190730)
samplesize <- 100
covs <- data.frame(id = 1:samplesize,
                   trt = rbinom(samplesize, 1, 0.5),
                   bmi = truncnorm::rtruncnorm(samplesize, a = 15, b = 30, mean = 23, sd = 2),
                   sex = rbinom(samplesize, 1, 0.6))
simdat <- simsurv(dist = "weibull",
                  lambdas = 0.01,
                  gammas = 1.5,
                  betas = c(trt = -0.5, bmi = 0.1, sex = -0.1),
                  x = covs,
                  tde = c(trt = 0.15),
                  tdefunction = "log", 
                  maxt = 5,
                  seed = 20190730)
table(simdat$status)
#>
#> 0  1 
#> 34 66

The main reason you need to reduce lambda for your example is because BMI is ~23 and it has a true beta of 0.1. You need to adjust lambda to allow for the fact that BMI is adding so much to the model. Alternatively, you could centre BMI so that it has a mean of zero.

Finding ideal "true" parameters to simulate under is sometimes difficult and very context dependent! Often people will fit a model to a real dataset and then use the estimates from the fitted model as a basis for the "true" parameters to simulate under...