vincentarelbundock / marginaleffects

R package to compute and plot predictions, slopes, marginal means, and comparisons (contrasts, risk ratios, odds, etc.) for over 100 classes of statistical and ML models. Conduct linear and non-linear hypothesis tests, or equivalence tests. Calculate uncertainty estimates using the delta method, bootstrapping, or simulation-based inference
https://marginaleffects.com
Other
459 stars 47 forks source link

Difference results marginaleffects vs riskRegression/predict (cox) #1237

Open csthiago opened 4 days ago

csthiago commented 4 days ago

Hi Vincent,

I was testing the marginaleffects with cox regression. I get slightly different results that using predict from survival (point estimate).


library(survival)
library(riskRegression)
library(marginaleffects)
set.seed(100)

dt <- survival::bladder
dt$rx <- as.factor(dt$rx)
fit <- coxph(formula = Surv(stop,event)~ rx+size+number,data=dt,y=TRUE,x=TRUE)

avg_comparisons(fit, variables = "rx",newdata = datagrid(stop=20),
            type="survival")
rx1_mean <- predict(fit, type="survival",
        newdata = dt |> mutate(stop=20,
                               rx="1")) |> 
  mean()
rx2_mean <- predict(fit, type="survival",
        newdata = dt |> mutate(stop=20,
                               rx="2")) |> 
  mean()
rx2_mean-rx1_mean
ateFit1a <- ate(fit, data = dt, treatment = "rx", times = 20)
summary(ateFit1a)

The value I get from avg_comparisons is

image

The point estimate using predict is 0.1034387 and the values from riskRegression ATE is:

image

Do you have any idea why the difference?

Thank you very much

vincentarelbundock commented 3 days ago

datagrid() sets all unnamed variables to their mean or mode. That could be a difference, but I don't know what the ate() function does.

csthiago commented 3 days ago

the ate calculates the average treatment effect using g-formula (or ipw or double robust). I have redone using dt > mutate and the PE is equal (silly me). The CI is different, but the riskRegression uses influence function instead delta. Just to confirm one point. To calculate risk difference for cox regression (in each time), the sytanx would be:

avg_comparisons(fit, variables = "rx",by="time",
            type="survival")

and for risk ratio would be type="lp" ?

Thanks

vincentarelbundock commented 3 days ago

As always, this function computes a difference on the specified scale. Here, type="lp" means a difference between the two groups on the linear probability scale.

You can compute ratios instead of differences on any scale, by specifying type with avg_comparisons(fit, variables = "rx", comparison = "ratio")