Scikit-learn best practice states that there should be no input validation or modification of parameters in the constructor. Violation of this policy has previously led to bug #10971.
Modifying a parameter in the constructor can lead to undefined behavior when cloning the estimator. Performing input validation in the constructor leads to an issue where sklearn.model_selection.GridSearchCV and other similar estimators may pass invalid options to the class, since they set parameter values using the set_params method instead of the constructor. The recommendation is to validate inputs at the first place they are used in the estimator.
Describe your proposed implementation
My suggestion is to move all modification of parameters to the fit method and the creation of a separate _check_params method similar to _check_Xy called from fit. This solves the above problems since set_params is called after the constructor and before fit.
Describe possible alternatives
In some of the estimators input validation is sufficiently complex to warrant the creation of a separate method, whilst in others it is reasonable to add it directly to the fit method. Another alternative would be to perform all the logic currently in the constructor in a separate method, including parameter modification as needed.
Additional context
I'm currently working on a PR implementing this suggestion.
Describe the new feature or enhancement
Scikit-learn best practice states that there should be no input validation or modification of parameters in the constructor. Violation of this policy has previously led to bug #10971.
Modifying a parameter in the constructor can lead to undefined behavior when cloning the estimator. Performing input validation in the constructor leads to an issue where
sklearn.model_selection.GridSearchCV
and other similar estimators may pass invalid options to the class, since they set parameter values using theset_params
method instead of the constructor. The recommendation is to validate inputs at the first place they are used in the estimator.Describe your proposed implementation
My suggestion is to move all modification of parameters to the
fit
method and the creation of a separate_check_params
method similar to_check_Xy
called fromfit
. This solves the above problems sinceset_params
is called after the constructor and beforefit
.Describe possible alternatives
In some of the estimators input validation is sufficiently complex to warrant the creation of a separate method, whilst in others it is reasonable to add it directly to the
fit
method. Another alternative would be to perform all the logic currently in the constructor in a separate method, including parameter modification as needed.Additional context
I'm currently working on a PR implementing this suggestion.