Closed verajosemanuel closed 1 year ago
@tvdboom Can you remind me what was our conclusion on this. With PyCaret 3.0, do we need to have pycaret installed in the inference environment or just sklearn would work? I think originally our assumption is pickle format is self-contained, hence we do not need pycaret installed in target inference environment but I can't remember if this was our final conclusion?
Unfortunately , it's not possible to use pycaret's transformation pipeline without installing the library. The reason is that sklearn didn't offer all transformation steps we desired for pycaret (nor the pipeline flexibility, think off allowing transformers that drop rows) so we created custom ones. Pickle is not self-contained. You need the library to be able to use the unpickled object correctly.
If you are sure that you are only using sklearn transformers in the pipeline, you could do the following:
pipeline
attribute)TransformerWrapper
. The attribute transformer
of this class contains the underlying estimator.__module__
attribute)This could work, but we are making no assurances.
pycaret version checks
Location of the documentation
https://pycaret.gitbook.io/docs/get-started/functions/deploy#save_model
Documentation problem
I am in a position in which a colleague that only uses sklearn (not permitted to install pycaret on the server) needs the pre-processing pipeline used for training the XGBOOST model. To share the transformations done to data, i have been reading the documentation seeking on how exporting the pipeline in a way that can be used by sklearn but to no avail.
I've found I can save a model using save_model function but that file is meant for pycaret later use. I would like more clarification on exporting steps and objects to be consumed outside pycaret when this package is not available for whatever reasons.
My ideal process would be to train model using pycaret, choose the best, and then export preprocessing steps done to input data in a way that my colleague could take that file and use in sklearn to transform data to see if it fits their server workflow and test diffetent modeling aproaches just in case
Regards
Suggested fix for documentation
A better explanation on how to export steps as preprocessing or transformers (even models) for using it outside pycaret in case the destination environment only has sklearn available.