mlflow / recipes-classification-template

Template repo for kickstarting recipes for classification use case
Apache License 2.0
23 stars 10 forks source link

Custom transform is not working #25

Open leonardo-moraes-inbev opened 1 year ago

leonardo-moraes-inbev commented 1 year ago

Hello, I have tried to implement a simple transform using OneHotEncoder, but it is not working.

I tested in both ways:

from sklearn.preprocessing import OneHotEncoder

def transformer_fn():
    return OneHotEncoder()

and

from sklearn.preprocessing import OneHotEncoder

def transformer_fn():
    return OneHotEncoder

Error

2023/04/11 15:02:04 INFO mlflow.recipes.utils.execution: ingest, split: No changes. Skipping.
Run MLFlow Recipe step: transform
2023/04/11 15:02:05 INFO mlflow.recipes.step: Running step transform...
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/home/leonardo-moraes/Git/mlflow-recipes-titanic/.venv/lib/python3.10/site-packages/mlflow/recipes/step.py", line 139, in run
    self.step_card = self._run(output_directory=output_directory)
  File "/home/leonardo-moraes/Git/mlflow-recipes-titanic/.venv/lib/python3.10/site-packages/mlflow/recipes/steps/transform.py", line 148, in _run
    train_transformed = transform_dataset(train_df)
  File "/home/leonardo-moraes/Git/mlflow-recipes-titanic/.venv/lib/python3.10/site-packages/mlflow/recipes/steps/transform.py", line 144, in transform_dataset
    transformed_features = pd.DataFrame(transformed_features, columns=columns)
  File "/home/leonardo-moraes/Git/mlflow-recipes-titanic/.venv/lib/python3.10/site-packages/pandas/core/frame.py", line 797, in __init__
    mgr = ndarray_to_mgr(
  File "/home/leonardo-moraes/Git/mlflow-recipes-titanic/.venv/lib/python3.10/site-packages/pandas/core/internals/construction.py", line 337, in ndarray_to_mgr
    _check_values_indices_shape_match(values, index, columns)
  File "/home/leonardo-moraes/Git/mlflow-recipes-titanic/.venv/lib/python3.10/site-packages/pandas/core/internals/construction.py", line 408, in _check_values_indices_shape_match
    raise ValueError(f"Shape of passed values is {passed}, indices imply {implied}")
ValueError: Shape of passed values is (712, 1), indices imply (712, 329)
make: *** [Makefile:31: steps/transform/outputs/transformer.pkl] Error 1
leonardo-moraes-inbev commented 1 year ago

I was able to solve that issue by: downgrading the package to mlflow==2.2.1 and using ColumnTransformer, or Pipeline, instead the transformer directly.

def transformer_fn():
    categorical_features = ["feat1", "feat2", ..., "featn"]
    return ColumnTransformer(
        transformers=[
            ("onehot", OneHotEncoder(categories="auto"), categorical_features),
        ]
    )