Open a-wozniakowski opened 4 years ago
Pipeline
validating in __init__
is not consistent with what is recommended in our developer guide.
There should be no logic, not even input validation, and the parameters should not be changed. The corresponding logic should be put where the parameters are used, typically in fit.
StackingRegressor
was developed following this guideline.
As for this issue with stacking and get_params
, maybe we can raise a better error message.
@thomasjpfan thanks for the feedback. With this constraint, set_params
faces a similar issue.
For the error messages, how about overriding get_params
and set_params
from _BaseHeterogeneousEnsemble
in _BaseStacking
? This could be done with try and except, where both cases raise a TypeError
. Should I start a pull request?
I agree that it could be friendlier to users if we validated and failed earlier, but that is indeed not our current convention.
In some ways, adopting enthought's traits (which validate when set, IIRC) would be friendlier for our users and perhaps give us type annotations automatically, if we could infer parameter types from trait types, @thomasjpfan :)
In stacking, the
estimators
parameter is a list of(str, estimator)
, although this behavior is not enforced until thefit
method:With this example, one may want to run
reg.get_params()
, which will result inTypeError: cannot convert dictionary update sequence element #0 to a sequence
. This seems complicated for a user to debug, so I propose moving a variation of https://github.com/scikit-learn/scikit-learn/blob/973ec69d50410ccee0382ea9c203802b099436f1/sklearn/ensemble/_stacking.py#L138-L139into the constructor, similar to
https://github.com/scikit-learn/scikit-learn/blob/fd237278e895b42abe8d8d09105cbb82dc2cbba7/sklearn/pipeline.py#L830
in the Pipeline constructor.