Open aguschin opened 1 year ago
Ok, this is an instruction how to make it work and find these requirements. First, let's see a script to reproduce the problem:
# run.py
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import FunctionTransformer
from func import f
from mlem.api import save
pipe = make_pipeline(
FunctionTransformer(
f,
),
)
save(pipe, "pipeline", sample_data=0)
This is imports f
from func
:
# func.py
def f(x):
return x
Now, if you create both files and run python run.py
, you'll see pipeline.mlem
doesn't have other dependencies except for sklearn:
# pipeline.mlem
...
requirements:
- module: sklearn
package_name: scikit-learn
version: 1.1.1
there's a way to find those now, using DEEP_INSPECTION
setting in MLEM config.
E.g. If you run MLEM_DEEP_INSPECTION=true python run.py
, they're found:
# pipeline.mlem
...
requirements:
- module: numpy
version: 1.23.5
- module: sklearn
package_name: scikit-learn
version: 1.1.1
- is_package: false
module: func
name: func
source64zip: eJxLSU1TSNOo0LTiUgCCotSS0qI8hQouAE2bBoU=
type: custom
- module: scipy
version: 1.9.3
You can also set this as a variable for the MLEM project, so MLEM will pick it up automatically, so you don't need to set it via shell vars like in the example above:
$ mlem init
$ mlem config set core.DEEP_INSPECTION true
This will generate .mlem.yaml
config file with this setting set.
When using sklearn's
Pipeline
withFunctionTransformer
step, some requirements that are used in that step aren't found.FunctionTransformer
, and MLEM can't find that library.I have an example from a customer, so now I need to reproduce and investigate. From the top of my head, 2 solutions:
mlem.api.save
(list of) libs to be added as requirementsmlem.api.save