Open sharan21 opened 3 months ago
Hi @sharan21. Your understanding is correct. We acknowledge this conversion is confusing and are planning to remove it in the next major version, given repeating feedbacks. To give you some context, this conversion was added way back then where the primary model input was tabular data for the sake of efficient data casting. However, this assumption is no longer true and adding unnecessary overhead and complexity.
For the time being, you need to add a check in your predict
method to convert the dataframe back to a list of scalars:
def predict(self, context, model_input, params: Optional[Dict[str, Any]] = None):
if isinstance(model_input, pd.DataFrame):
inputs = model_input.to_dict(orient="records")
cc: @serena-ruan
I see. So in the future we will simply remove enforcing the schema via _enforce_schema
and remove this function completely? My understanding is that the behaviour will be something like:
model.predict(["hey"]
, there will be no schema enforcing or any type of convertion, mlflow which just CHECK that the input given matches the input schema (correct me if I am wrong)model.predict({"k":"v"})
, then mlflow will try to validate the schema and throw an exception in this case?Assuming this is True, I think it makes sense to stop the enforcing/converting from one dtype to another and instead validate that the input matches the schema. Also, I will be able to contribute to this and would certainly like to do so as I plan to frequently contribute and engage in the future.
Also just to continue and validate my understanding, the only input conversion that will be happening is in a function like parse_tf_serving_input
here: https://github.com/mlflow/mlflow/blob/master/mlflow/utils/proto_json_utils.py#L536
During inference, the json string is converted into the model input according to the input schema with the help of this function. And then will be passed to model.predict
after which no more conversion will occur inside mlflow.pyfunc functions
.
@mlflow/mlflow-team Please assign a maintainer and start triaging this issue.
Issues Policy acknowledgement
Where did you encounter this bug?
Local machine
Willingness to contribute
Yes. I would be willing to contribute a fix for this bug with guidance from the MLflow community.
MLflow version
System information
Describe the problem
Tl:dr: MLflow does not respect the dtype of input_example during logging and does not enforce the output of
_enforce_schema
to match this dtype.For example:
input_example
topyfunc.log_model
and the input signature in the MLModel fileinputs: '[{"type": "string", "required": true}]'
as expected.pyfunc.load_model
and run themodel.predict(["hey"])
method, it fails because the input to the wrapper's predict function has been converted to adataframe
in the following stacktrace:Root Cause
_enforce_schema
, which seems to suggest this is the expected behaviour from mlflow: https://github.com/mlflow/mlflow/blob/master/mlflow/models/utils.py#L1122Expected behavior from users end My understanding is that if a user passes an input_example to log_model, it means that their pyfunc wrapper is designed to use an input example of this dtype. However mlflow's
_enforce_schema
is preventing this behaviour and seems to prefer dataframes which will force the user to always use a data frame (even though List(Scalar)) is acceptable in the list of accepeted input dtypes.Is my undestand incorrect or is this the expected behaviour of mlflow? If it IS the expected behaviour of mlflow, why is this so? Does this make it impossible to use list of scalars during inference?
Tracking information
Code to reproduce issue
Stack trace
Other info / logs
What component(s) does this bug affect?
area/artifacts
: Artifact stores and artifact loggingarea/build
: Build and test infrastructure for MLflowarea/deployments
: MLflow Deployments client APIs, server, and third-party Deployments integrationsarea/docs
: MLflow documentation pagesarea/examples
: Example codearea/model-registry
: Model Registry service, APIs, and the fluent client calls for Model Registryarea/models
: MLmodel format, model serialization/deserialization, flavorsarea/recipes
: Recipes, Recipe APIs, Recipe configs, Recipe Templatesarea/projects
: MLproject format, project running backendsarea/scoring
: MLflow Model server, model deployment tools, Spark UDFsarea/server-infra
: MLflow Tracking server backendarea/tracking
: Tracking Service, tracking client APIs, autologgingWhat interface(s) does this bug affect?
area/uiux
: Front-end, user experience, plotting, JavaScript, JavaScript dev serverarea/docker
: Docker use across MLflow's components, such as MLflow Projects and MLflow Modelsarea/sqlalchemy
: Use of SQLAlchemy in the Tracking Service or Model Registryarea/windows
: Windows supportWhat language(s) does this bug affect?
language/r
: R APIs and clientslanguage/java
: Java APIs and clientslanguage/new
: Proposals for new client languagesWhat integration(s) does this bug affect?
integrations/azure
: Azure and Azure ML integrationsintegrations/sagemaker
: SageMaker integrationsintegrations/databricks
: Databricks integrations