loft-br / xgboost-survival-embeddings

Improving XGBoost survival analysis with embeddings and debiased estimators
https://loft-br.github.io/xgboost-survival-embeddings/
Apache License 2.0
313 stars 51 forks source link

Giving individual weights in the xgbsestackedweibull #65

Open yanih opened 1 year ago

yanih commented 1 year ago

Hi, everyone there,

Problem description

-I have a case-cohort data, which need to give each cases and non-cases corresponding weights to meet the disease rate in a natural population. -Normally, in a AFT model, as in lifelines: WeibullAFTFitter, the 'weight_col' can let me input weights. -In the constructions of XGBSEStackedWeibull in https://loft-br.github.io/xgboost-survival-embeddings/modules/stacked_weibull.html, the weilbull_params are the same as lifelines: WeibullAFTFitter, but when I use the code below to put weight_name in weibull_params. I got the error. It seems the 'weight_col' in WeibullAFTFitter cannot work in XGBSE.

Code sample

# parameters
xgb_params = {
    "objective": "survival:aft",
    "eval_metric": "aft-nloglik",
    "aft_loss_distribution": "normal",
    "aft_loss_distribution_scale": 0.795,
    "tree_method": "hist",
    "learning_rate": 5e-2,
    "max_depth": 8,
    "booster": "dart",
    "subsample": 0.5,
    "min_child_weight": 50,
    "colsample_bynode": 0.5
}

weibull_params={ 'weight_col':'weight'}

# fitting XGBSE model
xgbse_model = XGBSEStackedWeibull(xgb_params=xgb_params, weibull_params=weibull_params)

Error

self.weibull_aft = WeibullAFTFitter(**self.weibull_params)

TypeError: __init__() got an unexpected keyword argument 'weight_col'

Expected behavior

Got the individual-weighted XGBoost-AFT model

Possible solutions

  1. Should I label the matrix with weights first?
  2. Or the 'scale_pos_weight' in xgboost can be used?
yanih commented 1 year ago

Solutions

I checked the '_stacked_weibull.py' file and found that the ’weight_col‘ indeed wasn't in the fit of WeibullAFTFitter:

self.weibull_aft.fit(df=weibull_train_df, duration_col="duration", event_col="event", ancillary=True)

so I added the weight term in this file, and mainly changed these sentences below (the default weight=1):

def __init__(
        self,
        xgb_params=None,
        weibull_params=None,
        weight=1,
    ):
if xgb_params is None:
            xgb_params = DEFAULT_PARAMS
        if weibull_params is None:
            weibull_params = DEFAULT_PARAMS_WEIBULL

        self.xgb_params = xgb_params
        self.weibull_params = weibull_params
        self.persist_train = False
        self.feature_importances_ = None
        self.weight=weight
.
.
.
 # creating df to use lifelines API
        weibull_train_df = pd.DataFrame(
            {"risk": train_risk, "duration": T_train, "event": E_train, 'weight': self.weight})

        # fitting weibull aft
        self.weibull_aft = WeibullAFTFitter(**self.weibull_params)
        self.weibull_aft.fit(df=weibull_train_df, duration_col="duration", 
                             event_col="event",ancillary=True, weights_col='weight')

Ask for help

I am new in Python, I would appreciate it if anyone can check whether these changes are proper or not.