loft-br / xgboost-survival-embeddings

Improving XGBoost survival analysis with embeddings and debiased estimators
https://loft-br.github.io/xgboost-survival-embeddings/
Apache License 2.0
313 stars 51 forks source link

Ensembling of predictions #59

Open andwurl opened 1 year ago

andwurl commented 1 year ago

Dear xgbse-team,

what would be the correct way for ensembling of predictions? Let's say that I have 5 StackedWeibull models and would like to ensemble their predictions on a test dataset. Should I average the interval predictions?

Thank you very much.

davivieirab commented 1 year ago

Hello @andwurl . Are those 5 models trained using the same dataset (or bootstrap samples from the same dataset) - ex.: a bagging estimator? If that is the case, I propose that you use our abstraction called XGBSEBootstrapEstimator. It is a bagging estimator that receives a base model and the number of models to train. Since we already have it implemented, we suggest its usage. If you are want to know further, the aggregation of predictions in a test dataset is made using the following:

You can find examples on how to use our module XGBSEBootstrapEstimator in our "how_xgbse_works" notebook.

Code example (read the beginning of the notebook to get the necessary import statements and constants/parameters used below) - the examples uses a XGBSEDebiasedBCE as base_model, but it is also available for the XGBSEStackedWeibull:

# base model as BCE
base_model = XGBSEDebiasedBCE(PARAMS_XGB_AFT, PARAMS_LR)

# bootstrap meta estimator
bootstrap_estimator = XGBSEBootstrapEstimator(base_model, n_estimators=20)

# fitting the meta estimator
bootstrap_estimator.fit(
    X_train,
    y_train,
    validation_data=(X_valid, y_valid),
    early_stopping_rounds=10,
    time_bins=TIME_BINS,
)

# predicting
mean, upper_ci, lower_ci = bootstrap_estimator.predict(X_test, return_ci=True)