It seems like the eval_metric you can specify during initiation of the EarlyStoppingShapRFECV class is not being used in the case of a LGBMClassifier (did not test the other tree-based methods).
Environment (please complete the following information):
probatus version: 3.1.0
python version: 3.11.8
OS: linux
To Reproduce
import lightgbm
import numpy as np
import pandas as pd
from sklearn.datasets import make_classification
from sklearn.model_selection import RandomizedSearchCV
from probatus.feature_elimination import ShapRFECV
feature_names = [
"f1",
"f2_missing",
"f3_static",
"f4",
"f5",
"f6",
"f7",
"f8",
"f9",
"f10",
"f11",
"f12",
"f13",
"f14",
"f15",
"f16",
"f17",
"f18",
"f19",
"f20",
]
# Prepare two samples
X, y = make_classification(
n_samples=1000,
class_sep=0.05,
n_informative=6,
n_features=20,
random_state=0,
n_redundant=10,
n_clusters_per_class=1,
)
X = pd.DataFrame(X, columns=feature_names)
# Make missing nr consistent
np.random.seed(42)
X["f2_missing"] = X["f2_missing"].apply(lambda x: x if np.random.rand() < 0.8 else np.nan)
X["f3_static"] = 0
from probatus.feature_elimination import EarlyStoppingShapRFECV
model = lightgbm.LGBMClassifier(n_estimators=200, max_depth=3)
# Run feature elimination
shap_elimination = EarlyStoppingShapRFECV(
model=model, step=0.2, cv=10, scoring="roc_auc", eval_metric="auc", early_stopping_rounds=5, n_jobs=3, verbose=2
)
report = shap_elimination.fit_compute(X, y)
(This is just the exact example data set + call to EarlyStoppingShapRFECV from here just with verbose=2 added and actually using the correct model (model=model))
I would expect the parameter for eval_metric to be used during the evaluations for the early-stopping.
Potential solution
I think the issue comes from the fact that for LGBMClassifier the eval_metric is being set here but this does not work as the class does not have this parameter. What should be done (I assume) is this:
Describe the bug
It seems like the
eval_metric
you can specify during initiation of theEarlyStoppingShapRFECV
class is not being used in the case of aLGBMClassifier
(did not test the other tree-based methods).Environment (please complete the following information):
To Reproduce
(This is just the exact example data set + call to
EarlyStoppingShapRFECV
from here just withverbose=2
added and actually using the correct model (model=model
))Error traceback
There is no error but with
verbose=2
you can see:Expected behavior
I would expect the parameter for
eval_metric
to be used during the evaluations for the early-stopping.Potential solution I think the issue comes from the fact that for
LGBMClassifier
theeval_metric
is being set here but this does not work as the class does not have this parameter. What should be done (I assume) is this:eval_metric=self.eval_metric
to this asLGBMClassifier
's fit method does have aneval_metric
parameter