goranbrostrom / eha

eha
6 stars 1 forks source link

weibreg() not working for stratified survival data #5

Closed jintaili closed 4 years ago

jintaili commented 4 years ago

Hi! I am trying to estimate a proportional hazard model with time dependent coefficients with Weibull baseline hazard function. I was very happy to find eha because of the availability of the function weibreg(). However, I have not been able to correctly estimate the model. Specifically, I carried out the following three attempts:

  1. Firstly, to make sure the model runs on unstratified data, I estimated the following proportional hazard model with time-invariant coefficients. The model was successfully estimated with reasonable results.
    pw = weibreg(
    formula = Surv(tstart, failures, event) ~ 
    (. - g - subs_ind ), 
    data = x_train, 
    na.action = getOption("na.action"),
    init = rep(0.01, (ncol(x_train) - 5)),
    shape = 0,
    control = list(eps = 1e-4, maxiter = 50, silent = T),
    singular.ok = TRUE,
    x = FALSE,
    y = TRUE,
    center = TRUE
    )
  2. Secondly, I tried to estimate stratified model with the following specifications.
    pw = weibreg(
    formula = Surv(tstart, failures, event) ~ 
    (. - g - subs_ind ):strata(g), 
    data = x_test, 
    na.action = getOption("na.action"),
    init = rep(0.01, 4 * (ncol(x_train) - 5)),
    shape = 0,
    control = list(eps = 1e-4, maxiter = 50, silent = T),
    singular.ok = TRUE,
    x = FALSE,
    y = TRUE,
    center = TRUE
    )

    This model was not able to finish estimation with the following error message:

    Error in X %*% fit$coefficients[1:ncov] :
    requires numeric/complex matrix/vector arguments

    I trace()d the weibreg function to print out fit. It seemed like convergence failed:

         Length Class  Mode
    fail     1      -none- numeric
    n.strata 1      -none- numeric
    value    1      -none- numeric
  3. I again tried the previous model, except with init = rep(0, 4 * (ncol(x_train) - 5)). This model did not finish either, with warning X matrix deemed to be singular.

I did implement the same calls in phreg(), and received error codes 3 and 4 depending on the number of strata I specified. In addition, the documentation did not elaborate on how to specify control argument. Thanks!

> sessionInfo()
R version 3.6.1 (2019-07-05)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Catalina 10.15.2

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] graphics  grDevices datasets  stats     utils     methods   base

other attached packages:
 [1] eha_2.8.0                       caret_6.0-84                    lattice_0.20-38                 keras_2.2.5.0
 [5] survAUC_1.0-5                   flexsurv_1.1.1                  survminer_0.4.6                 ggpubr_0.2.4
 [9] survival_3.1-8                  plyr_1.8.4                      plotly_4.9.1                    ggplot2_3.2.1
[13] bigrquery_1.2.0                 jsonlite_1.6                    dtplyr_0.0.3                    rsutils_6.07.0009810249
[17] rsugeneral_6.07.0005051101      rsuxls_6.07.000495052           rsuworkspace_6.07.000989855     rsuvydia_6.07.000514856
[21] rsushiny_6.07.00010110057       rsuscrubbers_1.09.003545398     rsuprophesize_1.09.003989854    rsuplotting_6.07.000515098
[25] rsunotify_6.07.000575751        rsujesus_6.07.0005551101        rsuaws_6.07.000101101101        rsudb_6.07.000485697
[29] rsucurl_1.09.003535750          rsuconsoleutils_6.07.0001019957 rsubitly_1.09.0035650101        rsuaspath_6.07.000564951
[33] rcreds_0.7.7                    collectArgs_0.5.0               magrittr_1.5                    dplyr_0.8.3
[37] data.table_1.12.2               colorout_1.2-2

loaded via a namespace (and not attached):
  [1] colorspace_1.4-1     ggsignif_0.6.0       class_7.3-15         base64enc_0.1-3      rstan_2.19.2         remotes_2.1.0
  [7] ggrepel_0.8.1        bit64_0.9-7          prodlim_2019.10.13   mvtnorm_1.0-11       lubridate_1.7.4      codetools_0.2-16
 [13] splines_3.6.1        knitr_1.25           zeallot_0.1.0        broom_0.5.2          km.ci_0.5-2          tfruns_1.4
 [19] shiny_1.4.0          compiler_3.6.1       httr_1.4.1           backports_1.1.5      assertthat_0.2.1     Matrix_1.2-17
 [25] fastmap_1.0.1        lazyeval_0.2.2       cli_1.1.0            later_1.0.0          formatR_1.7          htmltools_0.4.0
 [31] prettyunits_1.0.2    tools_3.6.1          gtable_0.3.0         glue_1.3.1           reshape2_1.4.3       Rcpp_1.0.2
 [37] prophet_0.5          vctrs_0.2.0          nlme_3.1-140         iterators_1.0.12     timeDate_3043.102    gower_0.2.1
 [43] xfun_0.10            stringr_1.4.0        ps_1.3.0             mime_0.7             lifecycle_0.1.0      muhaz_1.2.6.1
 [49] MASS_7.3-51.4        zoo_1.8-6            scales_1.0.0         microbenchmark_1.4-7 ipred_0.9-9          promises_1.1.0
 [55] parallel_3.6.1       inline_0.3.15        RColorBrewer_1.1-2   reticulate_1.13      gridExtra_2.3        KMsurv_0.1-5
 [61] loo_2.1.0            StanHeaders_2.19.0   rpart_4.1-15         stringi_1.4.3        tensorflow_2.0.0     foreach_1.4.7
 [67] pkgbuild_1.0.6       lava_1.6.6           rlang_0.4.0          pkgconfig_2.0.3      matrixStats_0.55.0   bitops_1.0-6
 [73] purrr_0.3.3          labeling_0.3         recipes_0.1.7        htmlwidgets_1.5.1    bit_1.1-14           processx_3.4.1
 [79] tidyselect_0.2.5     deSolve_1.27.1       R6_2.4.0             generics_0.0.2       DBI_1.0.0            whisker_0.4
 [85] pillar_1.4.2         withr_2.1.2          nnet_7.3-12          RCurl_1.95-4.12      tibble_2.1.3         mstate_0.2.12
 [91] crayon_1.3.4         survMisc_0.5.5       grid_3.6.1           callr_3.3.2          ModelMetrics_1.2.2   digest_0.6.21
 [97] xtable_1.8-4         tidyr_1.0.0          httpuv_1.5.2         stats4_3.6.1         munsell_0.5.0        viridisLite_0.3.0
[103] quadprog_1.5-7
goranbrostrom commented 4 years ago

Dear jintaili,

The sad truth is that no function in eha can deal with interactions between a stratum variable and covariates. Otherwise, weibreg and friends do work for stratified data, contrary to what you suggest.

jintaili commented 4 years ago

Dear Prof Broström,

Thanks for clarifying!

I wanted to introduce a little about my objective. In my data, I observe significant non-proportionality / martingale residuals with respect to many covariates, but overall a very Weibull-like survival function. Therefore I am trying to build a (locally) proportional hazard model with parametric baseline hazard function and time-dependent coefficients.

The reason for choosing a parametric over the Cox model is for ease of prediction. The reasons for choosing time-dependent constant coefficients over other proportionality relaxations is for ease of interpretation.

Currently, I think I have found a work-around within the eha and survival framework. Specifically, I have split my data into several episodes, noted by covariate g, and written the formula in the format of the following:

cutoffs = c(4, 21, 60)

x_train = survSplit(Surv(t_end, event) ~ ., 
    data, cut = cutoffs,
    episode = 'g') %>% data.table()

srv = Surv(tstart, t_end, event)

formula = as.formula('srv ~ I(x1 * (g == 1)) + I(x1 * (g == 2)) + I(x2 * (g == 1)) + I(x2 * (g == 2))')

Then this formula is estimated through the following:

pw = phreg(
    formula = formula, 
    data = x_train,
    na.action = getOption("na.action"),
    dist = "weibull",
    cuts = NULL,
    init = rep(0.01, max(x_train$g) * (ncol(x_train) - 5)),
    shape = 0,
    param = c("canonical", "rate"),
    control = list(eps = 1e-04, maxiter = 20, trace = FALSE),
    singular.ok = TRUE,
    model = FALSE,
    x = FALSE,
    y = TRUE,
    center = TRUE
)

It did seem that I could estimate this model with low numbers of cutoffs. However, I am not certain if what I am doing is helping me to achieve what I aimed for, nor if this is the state-of-the-art or common practice for that matter. Would appreciate your input on this. In addition, is it possible to predict using phreg models? Thanks!

Jintai

goranbrostrom commented 4 years ago

I guess that your last effort addresses the non-proportionality aspect, but it is not the same as the stratum-covariate interaction in your first message.

Of course you can do predictions, but there are no ready-made prediction function for phreg in eha. You have to write it yourself.

Göran

jintaili commented 4 years ago

OK thanks!