therneau / survival

Survival package for R
390 stars 106 forks source link

log-Likelihood contributions for survreg model #32

Closed HeidiSeibold closed 6 years ago

HeidiSeibold commented 6 years ago

I would like to compute the log-Likelihood contribution of given observations. For example, I would like to compute the log-Likelihood contributions of the observations in newdat to model mod.

library("survival")
mod <- survreg(Surv(futime, fustat) ~ rx, ovarian, dist = "weibull")
newdat <- ovarian[3:5, ]
newdat
#>   futime fustat     age resid.ds rx ecog.ps
#> 3    156      1 66.4658        2  1       2
#> 4    421      0 53.3644        2  2       1
#> 5    431      1 50.3397        2  1       1

I would like this to be a function that works for all survreg models (all dist), i.e.

objfun.survreg <- function(x, newdata = NULL, weights = NULL, ...) {
  if(is.null(newdata)) {
    # compute logLik contributions for data used to compute model x
  } else {
    # compute logLik contributions for newdata
  }
}

any hints on how this could be done best? I looked inside the survival package but the maximisation seems to be in C (which is very hard for me to read :upside_down_face:)

I have already started writing objfun.lm and objfun.glm in case you are interested: https://r-forge.r-project.org/scm/viewvc.php/pkg/model4you/R/objfun.R?view=markup&root=partykit

HeidiSeibold commented 6 years ago

I think I figured it out:

library("survival")
x <- survreg(Surv(futime, fustat) ~ rx, ovarian, dist = "weibull")
newdata <- ovarian[3:5, ]
newdata
#>   futime fustat     age resid.ds rx ecog.ps
#> 3    156      1 66.4658        2  1       2
#> 4    421      0 53.3644        2  2       1
#> 5    431      1 50.3397        2  1       1

objfun.survreg <- function(x, newdata = NULL, weights = NULL, ...) {

  ## get outcome
  modformula <- Formula::as.Formula(x$terms)
  yformula <- formula(modformula, lhs = 1, rhs = 0)
  if (is.null(newdata)) {
    ymf <- model.frame(x)
  } else {
    ymf <- model.frame(yformula, data = newdata)
  }
  y <- as.matrix(ymf[[1]])
  if (attr(ymf[[1]], "type") != "right") 
    stop("So far only right censored outcome allowed.")

  ## weights
  if (is.null(weights)) 
    weights <- rep(1, times = NROW(y))

  ## get linear predictor
  if (is.null(newdata)) {
    lp <- predict(x, type = "linear")
  } else {
    lp <- predict(x, newdata = newdata, type = "linear")
  }
  y <- cbind(y, lp = lp, weights = weights)
  scale <- x$scale

  ## get Likelihood contributions
  ## = density for uncensored observations
  ## = survivor (1 - CDF) for censored observations
  get_lik <- function(t) {
    if (t["status"] == 0) {
      t["weights"] * (1 - psurvreg(q = t["time"], mean = t["lp"], scale = scale, 
        distribution = "weibull"))
    } else {
      t["weights"] * dsurvreg(x = t["time"], mean = t["lp"], scale = scale, 
        distribution = "weibull")
    }
  }
  contribs <- apply(y, 1, get_lik)

  ## return log-Likelihood contributions
  return(log(contribs))
}

sum(objfun.survreg(x))
#> [1] -97.36415
x$loglik
#> [1] -97.95390 -97.36415

objfun.survreg(x, newdata = newdata)
#>          3          4          5 
#> -7.0670301 -0.2199615 -7.2257248

Is that correct?

therneau commented 6 years ago

Sorry to be so slow. What you want is already present.

z <- resid(fit, type='matrix') will give a 6 column matrix, the first column is the contribution to the loglik. See the help page for resid. The only small nuisance is that for log(y) models (log-logistic, weibull, etc) it gives the loglik for log(y); if you want the loglik for y you have to multiply by the Jacobian.

The book by Meeker and Escobar discusses many diagnostic plots based on z.

HeidiSeibold commented 6 years ago

Thanks @therneau! :cake: