loft-br / xgboost-survival-embeddings

Improving XGBoost survival analysis with embeddings and debiased estimators
https://loft-br.github.io/xgboost-survival-embeddings/
Apache License 2.0
321 stars 53 forks source link

lifelines.exceptions.ConvergenceError #38

Open NudnikShpilkis opened 3 years ago

NudnikShpilkis commented 3 years ago

When running a XGBSEStackedWeibull, I get a lifelines.exceptions.ConvergenceError with the following message:

lifelines.exceptions.ConvergenceError: Fitting did not converge. Try the following:

0. Are there any lifelines warnings outputted during the `fit`?
1. Inspect your DataFrame: does everything look as expected?
2. Try scaling your duration vector down, i.e. `df[duration_col] = df[duration_col]/100`
3. Is there high-collinearity in the dataset? Try using the variance inflation factor (VIF) to find redundant variables.
4. Try using an alternate minimizer: ``fitter._scipy_fit_method = "SLSQP"``.
5. Trying adding a small penalizer (or changing it, if already present). Example: `WeibullAFTFitter(penalizer=0.01).fit(...)`.
6. Are there any extreme outliers? Try modeling them or dropping them to see if it helps convergence.

Given the pipeline nature of XGBSEStackedWeibull. Are there recommended steps to getting past the convergence error? I.E. Will the lifelines recommendations still hold, or are there other methods I should try?

GabrielGimenez commented 3 years ago

You can follow lifelines recommendations, except for 3. Is there high-collinearity in the dataset? Try using the variance inflation factor (VIF) to find redundant variables. as it shouldn't make a difference, since we are only using hazard predicted on xgboost as feature for fitting the lifelines WeibullAFT model.

NudnikShpilkis commented 3 years ago

I've tried a few of the fixes listed above. Interestingly, when I use 2., scaling the duration vector down, I get vastly accelerated survival curves. Is there a second change I have to run to un-scale my predictions from a survival curve? Or are the predictions of the curve scale-invariant?

Should the time-bins used by XGBSE also be scaled?