iterative / mlem

🐶 A tool to package, serve, and deploy any ML model on any platform. Archived to be resurrected one day🤞
https://mlem.ai
Apache License 2.0
717 stars 44 forks source link

Requirements aren't found when using `FunctionTransformer` #666

Open aguschin opened 1 year ago

aguschin commented 1 year ago

When using sklearn's Pipeline with FunctionTransformer step, some requirements that are used in that step aren't found.

I have an example from a customer, so now I need to reproduce and investigate. From the top of my head, 2 solutions:

  1. Allow to pass in mlem.api.save (list of) libs to be added as requirements
  2. Enable "deep inspection mode" with an option for mlem.api.save
  3. Fixing both cases, keeping the same mechanics in place
aguschin commented 1 year ago

Ok, this is an instruction how to make it work and find these requirements. First, let's see a script to reproduce the problem:

# run.py
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import FunctionTransformer
from func import f
from mlem.api import save

pipe = make_pipeline(
    FunctionTransformer(
        f,
    ),

)

save(pipe, "pipeline", sample_data=0)

This is imports f from func:

# func.py
def f(x):
    return x

Now, if you create both files and run python run.py, you'll see pipeline.mlem doesn't have other dependencies except for sklearn:

# pipeline.mlem
...
requirements:
- module: sklearn
  package_name: scikit-learn
  version: 1.1.1

there's a way to find those now, using DEEP_INSPECTION setting in MLEM config.

E.g. If you run MLEM_DEEP_INSPECTION=true python run.py, they're found:

# pipeline.mlem
...
requirements:
- module: numpy
  version: 1.23.5
- module: sklearn
  package_name: scikit-learn
  version: 1.1.1
- is_package: false
  module: func
  name: func
  source64zip: eJxLSU1TSNOo0LTiUgCCotSS0qI8hQouAE2bBoU=
  type: custom
- module: scipy
  version: 1.9.3

You can also set this as a variable for the MLEM project, so MLEM will pick it up automatically, so you don't need to set it via shell vars like in the example above:

$ mlem init
$ mlem config set core.DEEP_INSPECTION true

This will generate .mlem.yaml config file with this setting set.