mlflow / mlflow

Open source platform for the machine learning lifecycle
https://mlflow.org
Apache License 2.0
18.61k stars 4.22k forks source link

[BUG]Requiring the length of values for each input to be same seems too strict #5469

Closed jingnanxue closed 2 years ago

jingnanxue commented 2 years ago

Thank you for submitting an issue. Please refer to our issue policy for additional information about bug reports. For help with debugging your code, please refer to Stack Overflow.

Please fill in this bug report template to ensure a timely and thorough response.

Willingness to contribute

The MLflow Community encourages bug fix contributions. Would you or another member of your organization be willing to contribute a fix for this bug to the MLflow code base?

System information

Describe the problem

Thought this sanity check is too strict. https://github.com/mlflow/mlflow/blob/5784e7e833385e59cd194fd63e4ae5e456abd779/mlflow/utils/proto_json_utils.py#L347

Based on the official documentation of tensorflow RESTful API https://www.tensorflow.org/tfx/serving/api_rest#request_format_2 The value for inputs key can either a single input tensor or a map of input name to tensors (listed in their natural nested form). Each input can have arbitrary shape and need not share the/ same 0-th dimension

For example, if our inputs include two tensors, one tensor with shape (-1, 5) (a batch N of product sequences, each with length 5) and one tensor with shape (-1) (it is a multi-classification problem but we don't calculate the prob for all classes, instead, we randomly pick K classes and only calculate the probs of these k classes for all sequences in this batch). K and N are not necessarily the same in each call.

It works well in tensorflow serving but doesn't work in mlflow serving due to the above restriction.

Code to reproduce issue

curl -H 'Content-Type: application/json' 'localhost:5000/invocations' -d '{
"inputs":{
  "input_seq":[[101,275,323,444,512],[289,303,156,223,357]],
  "input_candidates":[100,101,102,104,107,119,124]
  }
}'
{"error_code": "MALFORMED_REQUEST", "message": "Failed to parse data as TF serving input. The length of values for each input/column name are not the same", "stack_trace": "Traceback (most recent call last):\n  File \"/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/mlflow/pyfunc/scoring_server/__init__.py\", line 90, in infer_and_parse_json_input\n    return parse_tf_serving_input(decoded_input, schema=schema)\n  File \"/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/mlflow/utils/proto_json_utils.py\", line 244, in parse_tf_serving_input\n    \"Failed to parse data as TF serving input. The length of values for\"\nmlflow.exceptions.MlflowException: Failed to parse data as TF serving input. The length of values for each input/column name are not the same\n"}

Other info / logs

What component(s), interfaces, languages, and integrations does this bug affect?

Components

Interface

Language

Integrations

dbczumar commented 2 years ago

@jingnanxue Thank you for raising this. I agree. @tomasatdatabricks can you weigh in here? How difficult would this be to fix?

tomasatdatabricks commented 2 years ago

@dbczumar I think we can juts remove the check cc @arjundc-db who may have more context. I think we should do it to match the tf serving behavior. Thanks for bringing this up @jingnanxue!

dbczumar commented 2 years ago

@arjundc-db Can you weigh in here?

arjundc-db commented 2 years ago

Yes, that makes sense. I will add a PR to change this behavior. I think the intention was to ensure that if a user is inputting instances, every instance should contain all the named tensors. However, this check was incorrectly applied to both instances and inputs. @tomasatdatabricks does it make sense to keep this behavior/modify it iff the json contains an instance.

After the PR the following:

{
        "instances": [
            {"a": "s1", "b": 1},
            {"a": "s2", "b": 2, "c": [4, 5, 6]},
        ]
}

would fail validation, since the first instance does not contain the named tensor c.

tomasatdatabricks commented 2 years ago

@arjundc-db Makes sense to me but I think we want to keep parity with tf serving in this case because we use the tf-serving format. Can you check what is the behavior of tf-serving in this case?

jrbarclay37 commented 2 years ago

Hi, I am also facing the same issue with a model using PyTorch Geometric, which takes two tensors for the inputs. One tensor represents the nodes in the graph and another represents the edges. This works for batch inference, but not for online serving. My format is the same as above in the code to reproduce the issue:

{
       "inputs":{
             "nodes": tensor(num_nodes, num_features),
             "edge_index": tensor(num_edges, 2)
        }
}

It would be extremely helpful if MLflow would accept different lengths for tensor inputs.

jrbarclay37 commented 2 years ago

@tomasatdatabricks @arjundc-db I checked the tf-serving documentation, and it states that the row format instances requires the same 0-th dimension. However, for the columnar format inputs, it explicitly states that "Each input can have arbitrary shape and need not share the/ same 0-th dimension (aka batch size) as required by the row format described above."

https://www.tensorflow.org/tfx/serving/api_rest#request_format_2