goranbrostrom / eha

eha
6 stars 1 forks source link

Definition of survivor function #3

Closed adibender closed 6 years ago

adibender commented 6 years ago

Dear @goranbrostrom ,

I'm trying to implement prediction error curve for Weibull models on left-truncated data. When using eha package for estimation, this means I need to write a custom function that calculates survival probabilities for new data.

From the description in ?eha::aftreg I figured that below function (surv_prob) should do the the job, however, when comparing to the output of eha:::plot.aftreg I noticed a descrepancy (in case covariates are included in the model). I think one of the differences is that in eha:::plot.aftreg, the scale parameter lambda and score are calculated based on centered input variables, i.e.

lambda <- exp(x$coefficients[ncov + (1:ns) * 2 - 1] - 
            sum((new.data - x$means) * x$coefficients[1:ncov]))
 score <- exp(sum((new.data - x$means) * x$coefficients[1:ncov]))

However, this is in contrast to the description in eha::aftreg and respective vignette. Could you please give a hint how to obtain the above expressions for lambda and score (example below).

library(eha)
#> Loading required package: survival

# custom function for predicting survival probabilities for aftreg models
surv_prob <- function(object, newdata, times, ...) {

  coefs_aft <- coef(object)
  ind_scale <- grep("log(scale)", names(coefs_aft), fixed = TRUE)
  ind_shape <- grep("log(shape)", names(coefs_aft), fixed = TRUE)

  coef_scale <- coefs_aft[ind_scale]
  coef_shape <- coefs_aft[ind_shape]
  beta       <- coefs_aft[c(-ind_scale, -ind_shape)]
  X    <- model.matrix(object, newdata)[, -1, drop = FALSE]

  etas <- as.numeric(X %*% beta)

  vapply(
    times,
    function(time) {
      exp(- (time / exp(coef_scale - etas) ) ** exp(coef_shape))
    },
    numeric(length(etas)))

}

# data
data(mort)
# head(mort)
mod <- aftreg(Surv(time = enter, time2 = exit, event = event)~ses, data = mort)
# summary(mod)

# compare surv function integrated in plot.aftreg with custom functions
plot(mod, fn = "sur", new.data = data.frame(ses = c(0)))
surv <- surv_prob(mod, newdata = mort[2, ], times = seq(0, 20, by = 0.5))
lines(seq(0, 20, by = 0.5), surv, col = 2, lty = 2)

goranbrostrom commented 6 years ago

Dear @adibender,

as you already noticed, the centering business is the culprit. It has given me headaches for years. I have finally come to the conclusion that presenting results for centered covariates is a bad thing (internal centering is a completely different story). Users should center covariates themselves before running the analysis. This is not implemented now.

However, I think 'phreg' works better in this respect than 'aftreg', and since aft and ph are the same for the Weibull distribution, you could try 'phreg' for the time being.

Thanks for the report, Göran

adibender commented 6 years ago

Thanks for the quick reply.

I am still confused, however. Given a concrete specification of covariates, which of the two functions returns the correct survival probability?

goranbrostrom commented 6 years ago

Short answer: None. Long answer: 1. From phreg you get the necessary parameter estimates for the calculation of survival probabilities via equations (1), (2), and (11) in the vignette, using S_0(t) = exp(-t). 2. From aftreg you get the necessary parameter estimates for the calculation of survival probabilities via equation (13) in the vignette, using S_0(t) = exp(-t). The function pweibull(..., lower.tail = FALSE) is useful.

adibender commented 6 years ago

Sorry for being slow, but isn't that what I did? I understand equation (13), 3rd row of the equation, as follows:

mod <- aftreg(Surv(time = enter, time2 = exit, event = event)~ses, data = mort)
# summary(mod)
alpha <- coef(mod)["log(scale)"]
gamma <- coef(mod)["log(shape)"]
z <- 1
beta <- coef(mod)["sesupper"]
S_0 <- function(x) exp(-x)
surv_fun <- function(time, z, beta, alpha, gamma) {
  S_0( (time/exp(alpha - z * beta)) ** exp(gamma) )
}
times <- seq(0, 20, by = 0.5)
plot(times, surv_fun(times, z, beta, alpha, gamma), type = "l", ylim = c(0,1))
# equivalent (surv_prob from first post):
# lines(times, surv_prob(mod, newdata = mort[1,], times = times), col = 2, lty = 2)
goranbrostrom commented 6 years ago

Great, you got it!