Example for non-proportional hazards

sahirbhatnagar commented 5 years ago

As @turgeonmaxime pointed out in a conference call July 29, 2019, we need to sell the non-proportionality aspect of casebase. Here I use the simsurv R package to simulate data under a standard Weibull survival model that incorporates a time-dependent treatment effect. Here I compare to the rstpm2 package:

pacman::p_load(simsurv)
pacman::p_load(rstpm2)
pacman::p_load(casebase)
pacman::p_version(casebase)
#> [1] '0.1.1.9000'
pacman::p_load(splines)

covs <- data.frame(id = 1:5000, trt = rbinom(5000, 1, 0.5))
simdat <- simsurv(dist = "weibull", lambdas = 0.1, gammas = 1.5, betas = c(trt = -0.5),
                  x = covs, tde = c(trt = 0.15), tdefunction = "log", maxt = 5)
simdat <- merge(simdat, covs)
head(simdat)
#>   id eventtime status trt
#> 1  1  5.000000      0   0
#> 2  2  5.000000      0   1
#> 3  3  5.000000      0   1
#> 4  4  1.532679      1   1
#> 5  5  5.000000      0   1
#> 6  6  4.027367      1   0

mod_tvc <- rstpm2::stpm2(Surv(eventtime, status) ~ trt, 
                         data = simdat, tvc = list(trt = 1))

mod_cb <- casebase::fitSmoothHazard(status ~ trt + ns(log(eventtime), df = 3) + 
                                      trt:ns(log(eventtime),df=1),
                                    time = "eventtime",
                                    data = simdat, 
                                    ratio = 100)
summary(mod_tvc)
#> Maximum likelihood estimation
#> 
#> Call:
#> rstpm2::stpm2(formula = Surv(eventtime, status) ~ trt, data = simdat, 
#>     tvc = list(trt = 1))
#> 
#> Coefficients:
#>                                 Estimate Std. Error  z value     Pr(z)    
#> (Intercept)                     -7.24642    0.34796 -20.8253 < 2.2e-16 ***
#> trt                             -1.09383    0.24652  -4.4370 9.121e-06 ***
#> nsx(log(eventtime), df = 3)1     5.56583    0.24899  22.3540 < 2.2e-16 ***
#> nsx(log(eventtime), df = 3)2    10.86213    0.67290  16.1422 < 2.2e-16 ***
#> nsx(log(eventtime), df = 3)3     4.66293    0.13560  34.3867 < 2.2e-16 ***
#> trt:nsx(log(eventtime), df = 1)  0.99961    0.31402   3.1833  0.001456 ** 
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> -2 log L: 16057.87
summary(mod_cb)
#> 
#> Call:
#> glm(formula = formula, family = binomial, data = sampleData)
#> 
#> Deviance Residuals: 
#>     Min       1Q   Median       3Q      Max  
#> -0.1878  -0.1631  -0.1436  -0.1145   3.7885  
#> 
#> Coefficients:
#>                                Estimate Std. Error z value Pr(>|z|)    
#> (Intercept)                     -9.2439     3.7191  -2.486 0.012936 *  
#> trt                             -2.5312     0.7451  -3.397 0.000681 ***
#> ns(log(eventtime), df = 3)1      6.4011     2.6388   2.426 0.015275 *  
#> ns(log(eventtime), df = 3)2     12.7006     7.2612   1.749 0.080275 .  
#> ns(log(eventtime), df = 3)3      4.7000     1.2596   3.731 0.000190 ***
#> trt:ns(log(eventtime), df = 1)   2.9088     0.9817   2.963 0.003046 ** 
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> (Dispersion parameter for binomial family taken to be 1)
#> 
#>     Null deviance: 33470  on 301282  degrees of freedom
#> Residual deviance: 32833  on 301277  degrees of freedom
#> AIC: 32845
#> 
#> Number of Fisher Scoring iterations: 9

^{Created on 2019-07-29 by the reprex package (v0.2.1)}

turgeonmaxime commented 5 years ago

Here's another example using the Stanford Transplant data. The first graph clearly shows the non-proportional hazards.

library(survival)
library(casebase)
#> See example usage at http://sahirbhatnagar.com/casebase/
library(splines)
library(tidyverse)

# Fit several models
fit1 <- fitSmoothHazard(fustat ~ transplant,
                        data = jasa, time = "futime")
fit2 <- fitSmoothHazard(fustat ~ transplant + futime,
                        data = jasa, time = "futime")
fit3 <- fitSmoothHazard(fustat ~ transplant + bs(futime),
                        data = jasa, time = "futime")
fit4 <- fitSmoothHazard(fustat ~ transplant*bs(futime),
                        data = jasa, time = "futime")
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

# Compute AIC
AIC(fit1)
#> [1] 803.5401
AIC(fit2)
#> [1] 763.7171
AIC(fit3)
#> [1] 751.0198
AIC(fit4)
#> [1] 751.7766

# Compute hazards
hazard_data <- expand.grid(transplant = c(0, 1),
                           futime = seq(0, 1000,
                                        length.out = 100)) %>% 
    mutate(offset = 0)
hazard_data <- hazard_data %>% 
    mutate(hazard = predict(fit4, newdata = hazard_data, 
                            type = "link"),
           hazard = exp(hazard),
           Status = factor(transplant,
                           labels = c("NoTrans", "Trans")))

hazard_data %>% 
    ggplot(aes(futime, hazard, colour = Status)) +
    geom_line() +
    theme_minimal() +
    theme(legend.position = 'top') +
    ylab('Hazard')


# Compute absolute risk curves
newdata <- data.frame(transplant = c(0, 1))
absrisk <- absoluteRisk(fit4, newdata = newdata, 
                        time = seq(0, 1000, length.out = 100))

colnames(absrisk) <- c("Time", "NoTrans", "Trans")
absrisk %>% 
    as.data.frame %>% 
    gather("Status", "Risk", -Time) %>% 
    ggplot(aes(Time, Risk, colour = Status)) +
    geom_line() +
    theme_minimal() +
    theme(legend.position = 'top') +
    expand_limits(y = 1)

^{Created on 2019-11-24 by the reprex package (v0.3.0)}

turgeonmaxime commented 5 years ago

I just read more carefully the section in Kalbfleisch and Prentice where they describe the Stanford transplant data, and I think it would be a great case study for the paper, for two reasons:

Non-proportional hazards, as above.
Exposure is time-dependent, which means we need to do the case-base sampling "by hand".

I'll work on this over the next few days and add it to the paper, unless there are any objections (e.g. you already wrote a case-study based on the data above).

sahirbhatnagar / cbpaper

Example for non-proportional hazards #7