Open coltsfan opened 6 years ago
This feature has been requested before and I do plan to add in the next release (probably in February). In the meantime, please have a look at https://github.com/sebp/scikit-survival/issues/15#issuecomment-344757368 for alternative ways.
Hi,
Are there any updates on implementing these functions for CoxnetSurvivalAnalysis?
Thanks.
Unfortunately no, I couldn't find the time to implement this yet. Any contributions would appreciated.
Hi, I read through the comments about workarounds for this: specifically "using the predicted risk scores as a feature in a Cox model" to get predicted survival functions.
My question: is it appropriate to use the risk scores as the only feature in a Cox model?
I have tried to implement this and the results look sensible at first glance. Would be grateful if someone could take a look at my code below, and let me know if there is anything fundamentally wrong with this approach. Many thanks.
X = data[feature_cols]
y = data[['censor','time']]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1234)
estimator = GradientBoostingSurvivalAnalysis(subsample=0.95)
estimator.fit(X_train, y_train)
# predicted risk scores - reshaping for next step
y_hat_train = estimator.predict(X_train).reshape(-1, 1)
y_hat_test = estimator.predict(X_test).reshape(-1, 1)
# fit on risk scores from first step ONLY
cox_ph = CoxPHSurvivalAnalysis()
cox_ph.fit(y_hat_train, y_train)
surv_func_train = cox_ph.predict_survival_function(y_hat_train)
surv_func_test = cox_ph.predict_survival_function(y_hat_test)
Looks perfectly fine to me. As mentioned in https://github.com/sebp/scikit-survival/issues/15#issuecomment-344757368, the only downside is that this way is still subject to the proportional hazards assumption.
Just curious-- the implementation of survival function and cumulative hazard function for generic survival models that you were hoping to get to: is that "using the predicted risk scores as a feature in a Cox model" or Van Belle et al.'s method as mentioned in #15, or yet some different methods? Thanks.
I was planning on using the approach proposed by Van Belle et al. Unfortunately, due to other obligations, my time is limited at the moment.
At least for GradientBoostingSurvivalAnalysis w/ coxph loss, you could do something similar to what's in lifelines:
def generate_baselines(self, X, y):
X, event, time = check_arrays_survival(X, y, accept_sparse=['csr', 'csc', 'coo'], dtype=DTYPE)
ind_hazards = pd.DataFrame({
"P": np.exp(self.predict(X)),
"E": event
"T": time
})
ind_hazards_summed_over_durations = ind_hazards.groupby("T")[["P", "E"]].sum()
ind_hazards_summed_over_durations["P"] = (
ind_hazards_summed_over_durations["P"].loc[::-1].cumsum()
)
baseline_hazard = ind_hazards_summed_over_durations["E"] / ind_hazards_summed_over_durations["P"]
baseline_cumulative_hazard = baseline_hazard.cumsum()
baseline_survival = np.exp(-baseline_cumulative_hazard)
return baseline_hazard, baseline_cumulative_hazard, baseline_survival
From which survival = baseline_survival.values ** hazard_ratio
It seems the ensemble methods don't have these methods implemented. How do we generate survival function and cumulative hazard functions for the ensemble based survival methods?