Open jules-germany opened 1 year ago
I stumbled into the same issue. Is there any workout to this, other than tuning hyperparameters with a vanilla XGBoost model and simply transferring those to the XGBSe one?
We have developed a workaround
from xgbse import XGBSEStackedWeibull
class sklearn_wei(XGBSEStackedWeibull):
def get_params2(self):
return(self.get_params()['xgb_params'])
def set_params(self,**params):
old_params = self.get_params2()
old_params.update(params)
self.xgb_params = old_params
return(self)
ok = sklearn_wei()
`After you define the sklearn compatible model things work out as normal
from skopt import BayesSearchCV
from sklearn.model_selection import cross_val_score
from sklearn.metrics import make_scorer
from sklearn.model_selection import StratifiedKFold
from xgbse.metrics import (
concordance_index,
approx_brier_score,
dist_calibration_score
)
opt = BayesSearchCV(
estimator= ok,
scoring=make_scorer(concordance_index),
n_iter = 2,random_state = 42,
cv = StratifiedKFold(n_splits = 4, shuffle = True),
n_jobs = -1,
n_points = 1,
verbose = 1,
search_spaces = {
'max_depth': space.Integer(low = 1,high =32,prior = 'uniform'), # I don't know why but more than 32 breaks
'learning_rate': space.Real(10**-3, 10**-1, "log-uniform"),
#'tree_method': space.Categorical(categories=['hist','exact']),
'aft_loss_distribution': space.Categorical(categories=['normal','extreme','logistic']),
#'aft_loss_distribution_scale': space.Real(low=10**-2,high= 3*10**1,prior = "log-uniform"),
#'n_estimators': space.Real(low = 10**1,high = 10**3,prior = "log-uniform"),
'subsample':space.Real(low = .8,high = 1,prior = "uniform"),
'colsample_bytree':space.Real(low = .8,high = 1,prior = "uniform"),
'reg_alpha': space.Real(low=0,high = 1,prior = "uniform"),
'reg_lambda': space.Real(low=0,high = 1,prior = "uniform"),
'max_bin': space.Integer(low=200,high = 1000,prior = 'log-uniform'),
'min_split_loss' : space.Real(low = 0,high = 10,prior = 'uniform')
},
fit_params = {}
)
opt.fit(X = X_train,
y = y_train,
time_bins = range(1,59),
validation_data= (X_valid,y_valid),
early_stopping_rounds=5)
df_results = \
pd.DataFrame(opt.cv_results_).sort_values(by = 'rank_test_score', ascending = True)
df_results['params'] = df_results['params'].astype('string')
df_results.reset_index(drop = True, inplace = True)
df_results
Code sample
__
Problem description
__
Use of GridSearchCV is not possible because XGBSE requires hyperparameters to be unique and to be passed during model initiation. Furthermore, parameter vales in the parameter dict need to be without [], while GridSearchCV expects values in []. XGBSE therefore seems to be incompatible with GridSearchCV. Furthermore, XGBSE seems to be incompatible with sklearn's pipeline. If the sklearn pipeline is fitted, the estimator XGBSE receives the X dataframe as a np.array in the last step of the pipeline, which misses an index. This gives a corresponding error because XGBSE fitting seems to require X.index.
Expected behavior
__ It would be expected that XGBSE can be used with GridSearchCV and pipeline.
Possible solutions
__ It would be required that hyperparameters could be defined and that fitting would allow X without an index (np.array).