Open gorj-tessella opened 7 months ago
Alternatively, this functionality could be supported on just feature selection and pipelines. This would require a PreselectedFeatures class that all feature selection classes could convert into.
Did you have a look at sklearn-onnx
when the idea is only to have the inference part?
cc @GaelVaroquaux since you were mentioning exactly this the other day.
@adrinjalali @GaelVaroquaux curious what your thoughts were. My intuition is to use onnx or ... completely change the scikit-learn API and have the fitted model be a different class than the fitting algorithm ;)
@adrinjalali @GaelVaroquaux curious what your thoughts were.
Facilitate restoring predictors from storage, including across versions. Consider for instance linear models for regression. The prediction function is something really simple that is easy to keep stable across time. On the other hand, for the fitting algorithm, it's much harder to promise that options won't change, or that a fitting procedure will give the same result on the same data.
My intuition is to use onnx
I would say we should consider it optionally, but one of the factors of success of scikit-learn historically has been that it requires very little that is not installed on every data scientist's computer.
or ... completely change the scikit-learn API and have the fitted model be a different class than the fitting algorithm ;)
I think that we want to go this way. Another factor of success of scikit-learn is that it exposes a very simple surface to users, with little to learn or understand.
This can also be a separate package (to iterate faster also).
We could have a scikit-learn-predictor
kind of thing, where we get predictors from our classes. Testing is also not that hard, we test if the output of the predictor is the same as the sklearn native class. However, this does seem A LOT like ONNX, with the benefit of it being in python and lightweight.
Having a "predictor only" solution to the problem of "I trained a model in vX and now want to use it on vX+1" would be cool. Especially because there is currently no good answer to the problem and it crops up semi regularly on the issue tracker.
Describe the workflow you want to enable
Allow a trained estimator to be converted into a form suitable only for predict/transform type operations and not fitting. In many cases, the estimator could be made more compact or performant as part of this transformation.
For instance, feature selection steps in a pipeline may rely on complex models during training, but at inference they simply drop unused features. When deploying the model, conversion of the feature selection step to a simpler form could save memory and model load time.
Describe your proposed solution
Add a new method to BaseEstimator
prep_for_inference(self)
which returns a model which retains all predict/transform methods but does not necessarily support fitting. By default it would returnself
. Estimators and transformers could override this as necessary. Pipeline would convert each step.Describe alternatives you've considered, if relevant
No response
Additional context
No response