scikit-learn / scikit-learn

scikit-learn: machine learning in Python
https://scikit-learn.org
BSD 3-Clause "New" or "Revised" License
59.43k stars 25.26k forks source link

Make it possible to specify `monotonic_cst` with feature names in all tree-based estimators #28850

Closed alxhslm closed 5 months ago

alxhslm commented 5 months ago

Describe the workflow you want to enable

Instead of passing an array of monotonicity constraints (-1 for a decrease constraint, +1 for an increase constraint or 0 for no constraint) specified by feature positions in the training set, it would be more convenient to pass a dict to pass constraints spec only for the required feature names. For instance

from sklearn.datasets import load_diabetes
from sklearn.ensemble import RandomForestRegressor

X, y = load_diabetes(return_X_y=True, as_frame=True)

reg = RandomForestRegressor(
    monotonic_cst={"bmi": +1, "s3": -1}
)
reg.fit(X, y)

Not that here X has column names because it is a pd.DataFrame.

Note that this already supported for HistGradientBoostingRegressor. Ideally this would be supported across all tree-based models for consistency.

Describe your proposed solution

Use the _check_monotonic_cst function to validate the monotonic_cst argument in all estimators.

Describe alternatives you've considered, if relevant

This has already been implemented for HistGradientBoostingRegressor in #24855.

Additional context

See #24855 for the implementation of this for HistGradientBoostingRegressor.

adrinjalali commented 5 months ago

I don't think we'd be adding this for all tree based models. But we'll be moving HistGradientBoosting under general GradientBoosting most probably. (https://github.com/scikit-learn/scikit-learn/issues/27873)

cc @adam2392 @lorentzenchr @glemaitre