nxexox / pymlup

MLup is framework for easy and fast run ML in production
https://mlup.org
MIT License
22 stars 2 forks source link

[BUG] `binarization_type` not defaulting to "auto" #21

Open neverfox opened 1 month ago

neverfox commented 1 month ago

Describe the bug

Using a config for a LightGBM txt model file but not explicitly setting binarization_type not "auto" fails to load because it attempts to use the PickleBinarizer rather than the LightGBMBinarizer.

To Reproduce

Steps to reproduce the behavior:

  1. Run with a config set to use lightgbm-binary_cls_model.txt and do not set any config for binarization_type
  2. See error mlup.errors.ModelBinarizationError: Error with deserialize model: could not find MARK. Error will show that it is using ml/binarization/pickle.py not ml/binarization/lightgbm.py
  3. Try again with an explicit config of binarization_type: auto and it will work

Expected behavior

It should auto-detect that it is lightgbm and choose the LightGBMBinarizer.

Environment (please complete the following information):

nxexox commented 1 month ago

Thanks for this issue. I need to check this bug soon 👀

nxexox commented 3 weeks ago

@neverfox I couldn't repeat your error scenario. I think I'm doing something wrong :smile:

I ask you to clarify how I can repeat your script.

I've tried two ways:

Bash

To test the script, I used the following configuration file:

version: '1'
ml:
  auto_detect_predict_params: true
  storage_kwargs:
    files_mask: '(\w.-_)*.txt'
    path_to_files: models/lightgbm-binary_cls_model.txt
  storage_type: mlup.ml.storage.local_disk.DiskStorage

And also, with the addition of binarization_type: auto:

version: '1'
ml:
  auto_detect_predict_params: true
  binarization_type: auto
  storage_kwargs:
    files_mask: '(\w.-_)*.txt'
    path_to_files: models/lightgbm-binary_cls_model.txt
  storage_type: mlup.ml.storage.local_disk.DiskStorage

I used the following command: mlup run -c ./bug-lightgbm-conf.yaml.

The search worked out as expected. It was the LIGHTGBM binarizer that was launched.

Attaching logs:

image

Python code

To check, I used the following script:

from mlup import up
from mlup.constants import StorageType

_up = up.UP(
    conf=up.Config(
        storage_type=StorageType.disk,
        storage_kwargs={
            'path_to_files': "models/lightgbm-binary_cls_model.txt",
            'files_mask': r"(\w.-_)*.txt"
        },
    )
)
_up.ml.load()

print("There are no errors")

_up_from_conf = up.UP.load_from_yaml(
    conf_path="bug-lightgbm-conf.yaml",
    load_model=True
)

print("There are no errors after load from yaml")

From the python interpreter, the search for the binarizer worked out as expected.

Attaching logs:

image

Environment