mlflow / mlflow

Open source platform for the machine learning lifecycle
https://mlflow.org
Apache License 2.0
18.71k stars 4.23k forks source link

[FR] Support pandas categorical during schema enforcement #12910

Open timvink opened 3 months ago

timvink commented 3 months ago

Willingness to contribute

No. I cannot contribute this feature at this time.

Proposal Summary

I would like support for pandas categorical data when using mlflow signatures.

Using the category dtype is common when using boosting algorithms. Here's an example from the scikit-learn docs: Categorical Feature Support in Gradient Boosting.

Here's the current error message when trying to use the categorical dtype with mlflow:

import pandas as pd
from mlflow.models import infer_signature
from mlflow.pyfunc import _enforce_schema

df = pd.DataFrame({
    "col_1": ["1", "2", "3"],
    "col_2": ["1", "2", "3"],
})
df['col_2'] = df['col_2'].astype('category')

signature = infer_signature(df)
_enforce_schema(df, signature.inputs)
#> MlflowException: Incompatible input types for column col_2. Can not safely convert category to <U0.

Motivation

What is the use case for this feature?

Using boosting algorithms with categorical input data

Why is this use case valuable to support for MLflow users in general?

Using the category dtype is common when using boosting algorithms. Here's an example from the scikit-learn docs: Categorical Feature Support in Gradient Boosting.

Why is this use case valuable to support for your project(s) or organization?

We use boosting algorithms with categorical input data

Why is it currently difficult to achieve this use case?

It's not supported

Details

No response

What component(s) does this bug affect?

What interface(s) does this bug affect?

What language(s) does this bug affect?

What integration(s) does this bug affect?

timvink commented 3 months ago

See also related https://github.com/mlflow/mlflow/issues/3849

github-actions[bot] commented 2 months ago

@mlflow/mlflow-team Please assign a maintainer and start triaging this issue.

timvink commented 2 months ago

This might actually also be classified as a bug, given the error message is Can not safely convert category to <U0. is not very helpful