strengejacke / ggeffects

Estimated Marginal Means and Marginal Effects from Regression Models for ggplot2
https://strengejacke.github.io/ggeffects
Other
543 stars 35 forks source link

Predictions for survival analysis based on median time #557

Open strengejacke opened 1 month ago

strengejacke commented 1 month ago

Hi, I have a similar issue - and I don't think it can be solved on the x side. In my case, I have a parametric survival model (estimated using 'survreg' from the survival package), and what I want is the marginal predicted median survival time, not the marginal predicted mean survival time. In Stata, the margins command allows this fairly easily, with different functions for the predictions. From the Stata manual:

margins, predict(median time) gives predictions based on: $median = \{t:\hat{S}_{j}(t) = 1/2\}$ margins, predict(mean time) gives predictions based on: $mean = \int_{0}^{\infty} \hat{S}_{j}(t) dt$

I can also get median predictions using the 'predict.survreg' function: predict(fit,type="quantile",p=0.5). But that won't easily give me marginal (counterfactual) SEs.

And it doesn't look like I can pass the 'p' argument through when using ggeffects. I can pass the type="quantile" argument in ggaverage, but it estimates the 10th and 90th percentiles (the default). When I try to pass p=0.5 through ggeffects, it gives the error:

Error in sanity_equivalence_p_adjust(equivalence, p_adjust) : Assertion on 'p_adjust' failed: Must be element of set {'holm','hochberg','hommel','bonferroni','BH','BY','fdr','none'}, but types do not match (numeric != character).

Is there any way to pass that 'p=0.5' through?

Originally posted by @philipclare in https://github.com/strengejacke/ggeffects/issues/553#issuecomment-2259508117

strengejacke commented 1 month ago

Do you have a reproducible example? Not quite sure what you would like to achieve. Predicted values when time is split at the median?

philipclare commented 1 month ago

I can demonstrate using the 'lung' dataset, included in the survival package.

library(survival)
head(lung)
  inst time status age sex ph.ecog ph.karno pat.karno meal.cal wt.loss
1    3  306      2  74   1       1       90       100     1175      NA
2    3  455      2  68   1       0       90        90     1225      15
3    3 1010      1  56   1       0       90        90       NA      15
4    5  210      2  57   1       1       90        60     1150      11
5    1  883      2  60   1       0      100        90       NA       0
6   12 1022      1  74   1       1       50        80      513       0

If I load that data into Stata, run a fairly simple model model, and then ask for marginal predictions of median survival time over sex, I get:

. streg phecog age i.sex, distr(weibull) time nolog

        Failure _d: status
  Analysis time _t: time

Weibull AFT regression

No. of subjects =    227                                Number of obs =    227
No. of failures =    164
Time at risk    = 69,522
                                                        LR chi2(3)    =  29.98
Log likelihood = -262.47304                             Prob > chi2   = 0.0000

------------------------------------------------------------------------------
          _t | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
      phecog |  -.3396381   .0834784    -4.07   0.000    -.5032528   -.1760234
         age |  -.0074754   .0067635    -1.11   0.269    -.0207317    .0057808
       2.sex |   .4010905   .1237326     3.24   0.001     .1585792    .6436019
       _cons |   6.674526   .4274013    15.62   0.000     5.836835    7.512217
-------------+----------------------------------------------------------------
       /ln_p |   .3131927   .0613465     5.11   0.000     .1929559    .4334296
-------------+----------------------------------------------------------------
           p |   1.367785   .0839088                      1.212829    1.542539
         1/p |    .731109   .0448509                      .6482819    .8245183
------------------------------------------------------------------------------

. 
. margins i.sex, predict(median time)

Predictive margins                                         Number of obs = 227
Model VCE: OIM

Expression: Predicted median _t, predict(median time)

------------------------------------------------------------------------------
             |            Delta-method
             |     Margin   std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
         sex |
          1  |   284.5261   21.71691    13.10   0.000     241.9618    327.0905
          2  |   424.9263   44.34047     9.58   0.000     338.0206     511.832
------------------------------------------------------------------------------

If I run the same model in R using survreg, and then ask for margins from predict_response, I get:

> fit <- survreg(Surv(time, status) ~ phecog + age + sex, lung,dist="weibull")
> summary(fit)

Call:
survreg(formula = Surv(time, status) ~ phecog + age + sex, data = lung,dist = "weibull")
               Value Std. Error     z       p
(Intercept)  6.27344    0.45358 13.83 < 2e-16
phecog      -0.33964    0.08348 -4.07 4.7e-05
age         -0.00748    0.00676 -1.11  0.2690
sex          0.40109    0.12373  3.24  0.0012
Log(scale)  -0.31319    0.06135 -5.11 3.3e-07

Scale= 0.731 

Weibull distribution
Loglik(model)= -1132.4   Loglik(intercept only)= -1147.4
    Chisq= 29.98 on 3 degrees of freedom, p= 1.4e-06 
Number of Newton-Raphson Iterations: 5 
n=227 (1 observation deleted due to missingness)

> predict_response(fit,
+                  terms="sex",
+                  margin="counterfactual")
# Average predicted values of Surv(time, status)

sex | Predicted |         95% CI
--------------------------------
  1 |    371.96 | 309.17, 434.75
  2 |    555.50 | 503.87, 607.13

Which gives exactly the same model fit/parameters, but different marginal predictions, because the underlying prediction is not the median survival time. I can get the same predictions in R, using base predict with different 'newdata':

> newdat_m <- data.frame(cbind(time=lung$time,status=lung$status,phecog=lung$phecog,age=lung$age,sex=1))
> mean(predict(fit,newdata=newdat_m,type="quantile",p=0.5),na.rm=TRUE)
[1] 284.5261
> newdat_f <- data.frame(cbind(time=lung$time,status=lung$status,phecog=lung$phecog,age=lung$age,sex=2))
> mean(predict(fit,newdata=newdat_f,type="quantile",p=0.5),na.rm=TRUE)
[1] 424.9263

I can also get quantile marginal predictions from predict_response - but only the 'default' quantiles (10% and 90%):

> predict_response(fit,
+                  terms="sex",
+                  margin="counterfactual",
+                  type="quantile")
# Average predicted values of Surv(time, status)

Surv(time, status): 1

sex | Predicted |          95% CI
---------------------------------
  1 |     71.77 |  64.65,   78.90
  2 |    107.19 | 101.98,  112.40

Surv(time, status): 2

sex | Predicted |          95% CI
---------------------------------
  1 |    684.41 | 576.70,  792.11
  2 |   1022.13 | 898.18, 1146.08

> colMeans(predict(fit,newdata=newdat_m,type="quantile"),na.rm=TRUE)
[1]  71.7740 684.4083
> colMeans(predict(fit,newdata=newdat_f,type="quantile"),na.rm=TRUE)
[1]  107.1911 1022.1313

So it seems like all I need is to be able to pass that 'p=0.5' option that I would use in 'predict.survreg' through 'predict_response'. But when I try that, it just ignores it.