sebp / scikit-survival

Survival analysis built on top of scikit-learn
GNU General Public License v3.0
1.12k stars 214 forks source link

Generic estimator of survival and cumulative hazard functions #22

Open coltsfan opened 6 years ago

coltsfan commented 6 years ago

It seems the ensemble methods don't have these methods implemented. How do we generate survival function and cumulative hazard functions for the ensemble based survival methods?

sebp commented 6 years ago

This feature has been requested before and I do plan to add in the next release (probably in February). In the meantime, please have a look at https://github.com/sebp/scikit-survival/issues/15#issuecomment-344757368 for alternative ways.

mzhao94 commented 6 years ago

Hi,

Are there any updates on implementing these functions for CoxnetSurvivalAnalysis?

Thanks.

sebp commented 6 years ago

Unfortunately no, I couldn't find the time to implement this yet. Any contributions would appreciated.

tob789 commented 6 years ago

Hi, I read through the comments about workarounds for this: specifically "using the predicted risk scores as a feature in a Cox model" to get predicted survival functions.

My question: is it appropriate to use the risk scores as the only feature in a Cox model?

I have tried to implement this and the results look sensible at first glance. Would be grateful if someone could take a look at my code below, and let me know if there is anything fundamentally wrong with this approach. Many thanks.

X = data[feature_cols]
y = data[['censor','time']]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1234)

estimator = GradientBoostingSurvivalAnalysis(subsample=0.95)
estimator.fit(X_train, y_train)

# predicted risk scores - reshaping for next step
y_hat_train = estimator.predict(X_train).reshape(-1, 1)
y_hat_test = estimator.predict(X_test).reshape(-1, 1)

# fit on risk scores from first step ONLY
cox_ph = CoxPHSurvivalAnalysis()
cox_ph.fit(y_hat_train, y_train)

surv_func_train = cox_ph.predict_survival_function(y_hat_train)
surv_func_test = cox_ph.predict_survival_function(y_hat_test)
sebp commented 6 years ago

Looks perfectly fine to me. As mentioned in https://github.com/sebp/scikit-survival/issues/15#issuecomment-344757368, the only downside is that this way is still subject to the proportional hazards assumption.

leihuang commented 5 years ago

Just curious-- the implementation of survival function and cumulative hazard function for generic survival models that you were hoping to get to: is that "using the predicted risk scores as a feature in a Cox model" or Van Belle et al.'s method as mentioned in #15, or yet some different methods? Thanks.

sebp commented 5 years ago

I was planning on using the approach proposed by Van Belle et al. Unfortunately, due to other obligations, my time is limited at the moment.

rfdearborn commented 5 years ago

At least for GradientBoostingSurvivalAnalysis w/ coxph loss, you could do something similar to what's in lifelines:

def generate_baselines(self, X, y):
    X, event, time = check_arrays_survival(X, y, accept_sparse=['csr', 'csc', 'coo'], dtype=DTYPE)

    ind_hazards = pd.DataFrame({
        "P": np.exp(self.predict(X)),
        "E": event
        "T": time
    })

    ind_hazards_summed_over_durations = ind_hazards.groupby("T")[["P", "E"]].sum()
    ind_hazards_summed_over_durations["P"] = (
        ind_hazards_summed_over_durations["P"].loc[::-1].cumsum()
    )

    baseline_hazard = ind_hazards_summed_over_durations["E"] / ind_hazards_summed_over_durations["P"]

    baseline_cumulative_hazard = baseline_hazard.cumsum()

    baseline_survival = np.exp(-baseline_cumulative_hazard)

    return baseline_hazard, baseline_cumulative_hazard, baseline_survival

From which survival = baseline_survival.values ** hazard_ratio