scikit-learn / scikit-learn

scikit-learn: machine learning in Python
https://scikit-learn.org
BSD 3-Clause "New" or "Revised" License
59.93k stars 25.37k forks source link

Moving a variation of _validate_estimators from fit into the stacking constructor #18066

Open a-wozniakowski opened 4 years ago

a-wozniakowski commented 4 years ago

In stacking, the estimators parameter is a list of (str, estimator), although this behavior is not enforced until the fit method:

from sklearn.datasets import load_boston
from sklearn.linear_model import Ridge
from sklearn.ensemble import StackingRegressor

X, y = load_boston(return_X_y=True)
reg = StackingRegressor([Ridge(), Ridge()])

With this example, one may want to run reg.get_params(), which will result in TypeError: cannot convert dictionary update sequence element #0 to a sequence. This seems complicated for a user to debug, so I propose moving a variation of https://github.com/scikit-learn/scikit-learn/blob/973ec69d50410ccee0382ea9c203802b099436f1/sklearn/ensemble/_stacking.py#L138-L139

into the constructor, similar to

https://github.com/scikit-learn/scikit-learn/blob/fd237278e895b42abe8d8d09105cbb82dc2cbba7/sklearn/pipeline.py#L830

in the Pipeline constructor.

thomasjpfan commented 4 years ago

Pipeline validating in __init__ is not consistent with what is recommended in our developer guide.

There should be no logic, not even input validation, and the parameters should not be changed. The corresponding logic should be put where the parameters are used, typically in fit.

StackingRegressor was developed following this guideline.

As for this issue with stacking and get_params, maybe we can raise a better error message.

a-wozniakowski commented 4 years ago

@thomasjpfan thanks for the feedback. With this constraint, set_params faces a similar issue.

For the error messages, how about overriding get_params and set_params from _BaseHeterogeneousEnsemble in _BaseStacking? This could be done with try and except, where both cases raise a TypeError. Should I start a pull request?

jnothman commented 4 years ago

I agree that it could be friendlier to users if we validated and failed earlier, but that is indeed not our current convention.

In some ways, adopting enthought's traits (which validate when set, IIRC) would be friendlier for our users and perhaps give us type annotations automatically, if we could infer parameter types from trait types, @thomasjpfan :)