ing-bank / probatus

Validation (like Recursive Feature Elimination for SHAP) of (multiclass) classifiers & regressors and data used to develop them.
https://ing-bank.github.io/probatus
MIT License
132 stars 40 forks source link

eval_metric in EarlyStoppingShapRFECV not used for LGBMClassifier #259

Closed PaulZhutovsky closed 4 months ago

PaulZhutovsky commented 6 months ago

Describe the bug

It seems like the eval_metric you can specify during initiation of the EarlyStoppingShapRFECV class is not being used in the case of a LGBMClassifier (did not test the other tree-based methods).

Environment (please complete the following information):

To Reproduce

import lightgbm
import numpy as np
import pandas as pd
from sklearn.datasets import make_classification
from sklearn.model_selection import RandomizedSearchCV

from probatus.feature_elimination import ShapRFECV

feature_names = [
    "f1",
    "f2_missing",
    "f3_static",
    "f4",
    "f5",
    "f6",
    "f7",
    "f8",
    "f9",
    "f10",
    "f11",
    "f12",
    "f13",
    "f14",
    "f15",
    "f16",
    "f17",
    "f18",
    "f19",
    "f20",
]

# Prepare two samples
X, y = make_classification(
    n_samples=1000,
    class_sep=0.05,
    n_informative=6,
    n_features=20,
    random_state=0,
    n_redundant=10,
    n_clusters_per_class=1,
)
X = pd.DataFrame(X, columns=feature_names)

# Make missing nr consistent
np.random.seed(42)
X["f2_missing"] = X["f2_missing"].apply(lambda x: x if np.random.rand() < 0.8 else np.nan)
X["f3_static"] = 0

from probatus.feature_elimination import EarlyStoppingShapRFECV

model = lightgbm.LGBMClassifier(n_estimators=200, max_depth=3)

# Run feature elimination
shap_elimination = EarlyStoppingShapRFECV(
    model=model, step=0.2, cv=10, scoring="roc_auc", eval_metric="auc", early_stopping_rounds=5, n_jobs=3, verbose=2
)
report = shap_elimination.fit_compute(X, y)

(This is just the exact example data set + call to EarlyStoppingShapRFECV from here just with verbose=2 added and actually using the correct model (model=model))

Error traceback

There is no error but with verbose=2 you can see:

[LightGBM] [Warning] Unknown parameter: eval_metric

Expected behavior

I would expect the parameter for eval_metric to be used during the evaluations for the early-stopping.

Potential solution I think the issue comes from the fact that for LGBMClassifier the eval_metric is being set here but this does not work as the class does not have this parameter. What should be done (I assume) is this:

  1. Remove this part from the code.
  2. Add another line with eval_metric=self.eval_metric to this as LGBMClassifier's fit method does have an eval_metric parameter
ReinierKoops commented 4 months ago

Fixed, thanks @PaulZhutovsky