Open cadama opened 1 year ago
Xgboost supports categorical features since 1.6 but I am stumbling into an error when using it in shapicant. Here is a minimal example
import pandas as pd import numpy as np from shapicant import PandasSelector import shap import xgboost as xgb num_features = pd.DataFrame(np.random.random((100, 4)), columns=list(range(4))) categoricals = pd.DataFrame(np.random.randint(1, 10, (100, 3)), dtype="category", columns=list(range(4, 7))) X_train = pd.concat([num_features, categoricals], axis=1, ) X_test = X_train.copy() y_train = np.random.random((100, )) y_test = np.random.random((100, )) params = { "colsample_bynode": (len(num_features) + len(categoricals)) ** .5 / (len(num_features) + len(categoricals)), "learning_rate": 1, "max_depth": 5, "num_boost_round": 1, "num_parallel_tree": 100, "objective": "reg:logistic", "subsample": 0.62, "enable_categorical": True, "tree_method": "hist", "booster": "gbtree", "eval_metric": ['logloss', 'rmse'], 'base_score': y_train.mean() } model = xgb.XGBRFRegressor(**params, random_state=42) model.fit(X_train, y_train) # Use PandasSelector with 100 iterations explainer_type = shap.TreeExplainer selector = PandasSelector(model, explainer_type, n_iter=30, random_state=42) selector.fit( X_train, y_train, X_validation=X_test, estimator_params={ "eval_set": [(X_test, y_test)] }, )
Which results into
[10:01:13] WARNING: /Users/runner/miniforge3/conda-bld/xgboost-split_1667849614592/work/src/learner.cc:767: Parameters: { "num_boost_round" } are not used. Computing true SHAP values: 0%| | 0/30 [00:00<?, ?it/s][10:01:14] WARNING: /Users/runner/miniforge3/conda-bld/xgboost-split_1667849614592/work/src/learner.cc:767: Parameters: { "num_boost_round" } are not used. [0] validation_0-logloss:0.71347 validation_0-rmse:0.29523 Computing true SHAP values: 0%| | 0/30 [00:00<?, ?it/s] Traceback (most recent call last): File "/Users/cdalmaso/opt/anaconda3/lib/python3.9/site-packages/IPython/core/interactiveshell.py", line 3457, in run_code exec(code_obj, self.user_global_ns, self.user_ns) File "<ipython-input-36-cb665d5c0d88>", line 36, in <module> selector.fit( File "/Users/cdalmaso/opt/anaconda3/lib/python3.9/site-packages/shapicant/_pandas_selector.py", line 85, in fit true_pos_shap_values, true_neg_shap_values = self._get_shap_values( File "/Users/cdalmaso/opt/anaconda3/lib/python3.9/site-packages/shapicant/_pandas_selector.py", line 199, in _get_shap_values explainer = self.explainer_type(self.estimator, **explainer_type_params or {}) File "/Users/cdalmaso/opt/anaconda3/lib/python3.9/site-packages/shap/explainers/_tree.py", line 149, in __init__ self.model = TreeEnsemble(model, self.data, self.data_missing, model_output) File "/Users/cdalmaso/opt/anaconda3/lib/python3.9/site-packages/shap/explainers/_tree.py", line 859, in __init__ xgb_loader = XGBTreeModelLoader(self.original_model) File "/Users/cdalmaso/opt/anaconda3/lib/python3.9/site-packages/shap/explainers/_tree.py", line 1431, in __init__ self.buf = xgb_model.save_raw() File "/Users/cdalmaso/opt/anaconda3/lib/python3.9/site-packages/xgboost/core.py", line 2408, in save_raw _check_call( File "/Users/cdalmaso/opt/anaconda3/lib/python3.9/site-packages/xgboost/core.py", line 279, in _check_call raise XGBoostError(py_str(_LIB.XGBGetLastError())) xgboost.core.XGBoostError: [10:01:14] /Users/runner/miniforge3/conda-bld/xgboost-split_1667849614592/work/src/tree/tree_model.cc:869: Check failed: !HasCategoricalSplit(): Please use JSON/UBJSON for saving models with categorical splits. Stack trace: [bt] (0) 1 libxgboost.dylib 0x000000017fb0ed98 dmlc::LogMessageFatal::~LogMessageFatal() + 124 [bt] (1) 2 libxgboost.dylib 0x000000017fccca40 xgboost::RegTree::Save(dmlc::Stream*) const + 1184 [bt] (2) 3 libxgboost.dylib 0x000000017fc102a4 xgboost::gbm::GBTreeModel::Save(dmlc::Stream*) const + 312 [bt] (3) 4 libxgboost.dylib 0x000000017fc1b390 xgboost::LearnerIO::SaveModel(dmlc::Stream*) const + 1224 [bt] (4) 5 libxgboost.dylib 0x000000017fb2eb2c XGBoosterSaveModelToBuffer + 788 [bt] (5) 6 libffi.8.dylib 0x00000001019e804c ffi_call_SYSV + 76 [bt] (6) 7 libffi.8.dylib 0x00000001019e57d4 ffi_call_int + 1336 [bt] (7) 8 _ctypes.cpython-39-darwin.so 0x0000000101c8c544 _ctypes_callproc + 1324 [bt] (8) 9 _ctypes.cpython-39-darwin.so 0x0000000101c86850 PyCFuncPtr_call + 1176
I am running xgboost==1.7.1 and shapicant==0.4.0
xgboost==1.7.1
shapicant==0.4.0
This is due to a problem with the SHAP package, see https://github.com/slundberg/shap/issues/2662
It would be better to fix the issue in SHAP, otherwise I would have to develop a workaround in shapicant.
Xgboost supports categorical features since 1.6 but I am stumbling into an error when using it in shapicant. Here is a minimal example
Which results into
I am running
xgboost==1.7.1
andshapicant==0.4.0