pytorch / serve

Serve, optimize and scale PyTorch models in production
https://pytorch.org/serve/
Apache License 2.0
4.23k stars 864 forks source link

TorchServe with Kserve_wrapper v2 throws 'message': 'number of batch response mismatched' #2158

Open gavrissh opened 1 year ago

gavrissh commented 1 year ago

🐛 Describe the bug

Torchserve supports batching of multiple requests and batch_size value is provided while registering the model.

Request Envelope receives the input as list of multiple request body but Kserve V2 request envelope picks only the first item in the list of inputs https://github.com/pytorch/serve/blob/master/ts/torch_handler/request_envelope/kservev2.py#L104

The result being a single output sent back as response causing the mismatch

Error logs

TorchServe Error stdout MODEL_LOG - model: resnet50-3, number of batch response mismatched, expect: 5, got: 1.

Installation instructions

Followed instructions provided here - https://github.com/pytorch/serve/blob/master/kubernetes/kserve/kserve_wrapper/README.md

Model Packaing

Created a resnet50.mar using default parameters and handler

config.properties

inference_address=http://0.0.0.0:8085/ management_address=http://0.0.0.0:8085/ metrics_address=http://0.0.0.0:8082/ grpc_inference_port=7075 grpc_management_port=7076 enable_envvars_config=true install_py_dep_per_model=true enable_metrics_api=true metrics_format=prometheus NUM_WORKERS=1 number_of_netty_threads=4 job_queue_size=10 model_store=/mnt/models/model_store model_snapshot={"name":"startup.cfg","modelCount":1,"models":{"resnet50": {"1.0": {"defaultVersion": true,"marName": "resnet50.mar","minWorkers": 6,"maxWorkers": 6,"batchSize": 16,"maxBatchDelay": 200,"responseTimeout": 2000}}}}

Versions

Name: kserve Version: 0.10.0

Name: torch Version: 1.13.1+cu117

Name: torchserve Version: 0.7.1

Repro instructions

Followed instructions provided here - https://github.com/pytorch/serve/blob/master/kubernetes/kserve/kserve_wrapper/README.md

run the kserve_wrapper main.py and hit multiple curl infer request for v2 protocol

Command used - seq 1 10 | xargs -n1 -P 5 curl -H "Content-Type: application/json" --data @input_bytes.json http://0.0.0.0:8080/v2/models/resnet50/infer

Possible Solution

Changes required to handle Torchserve batched inputs and generate output for all the requests initiated by TorchServe

Changes are need in parse_input() and format_output() methods in kservev2.py

jagadeeshi2i commented 1 year ago

@gavrishp Both envelopes needs fix for KServe 0.10 and v2 protocol has 2 examples one with bytes input and another with tensor input. Let me know what are you working on. I can take up the remaining.

gavrissh commented 1 year ago

@jagadeeshi2i I can take the v2 protocol changes.

There's one use case, I need inputs for. As Kserve supports batching within a single request. Request Example -

{
  "inputs": [
    {
      "name": "input-0",
      "shape": [37],
      "datatype": "INT64",
      "data": [66, 108, 111, 111, 109]
    },
    {
      "name": "input-0",
      "shape": [37],
      "datatype": "INT64",
      "data": [66, 108, 111, 111, 109]
    }
  ]
}

Response for the above example is

{
  "model_name":"resnet50",
  "model_version":"3.0",
  "id":"c0229ab0-f157-4917-974a-93646a51a57d",
  "parameters":null,
  "outputs":[
    {
      "name":"predict",
      "shape":[],
      "datatype":"BYTES",
      "parameters":null,
      "data":[2]
    },
    {
      "name":"predict",
      "shape":[],
      "datatype":"BYTES",
      "parameters":null,
      "data":[2]
    }
  ]
}

But with torchserve batching of multiple requests, as the handler postprocess output would return list of outputs. Might also need to hold some additional state to keep track of which input came from which request_id right?

jagadeeshi2i commented 1 year ago

In the above example a single http request has multiple inputs in it. So the response will have outputs with same order with request id. You are referring to Torchserve dynamic batching, which is not supported in KServe integration.

gavrissh commented 1 year ago

This issue is concerning the Torchserve dynamic batching with Kserve integration. Is there any particular reason for it not being supported? Is it planned to be supported in the future?

If that is the case, TS model config batch_size should not be allowed to be set more than 1 right now. I suppose it is causing this particular issue.

My understanding is that this batching will help with better GPU Utilisation and higher throughput values. My testing results supports this.

jagadeeshi2i commented 1 year ago

Torchserve with KServe has batching support. The inputs are statically batched. Torchserve on it own dynamic batching where it waits for batch_delay time for batch_size to be filled.

KServe v2 requires sending all inputs in a single request. Setting batch_size more than 1 here will make Torchserve wait for the batch_dealy.

Regarding GPU utilization both static and dynamic batching starts processing after all the input in received so this will not affect the GPU uitilization.

gavrissh commented 1 year ago

Thanks for clarifying!

What would you suggest is the correct fix here for this issue?

jagadeeshi2i commented 1 year ago

set batch_size to 1

jagadeeshi2i commented 1 year ago

@gavrishp is the issue resolved now ?

gavrissh commented 1 year ago

@jagadeeshi2i Had a query, is it by design we are selecting only the first element in the batch in the kserve envelopes?

https://github.com/pytorch/serve/blob/master/ts/torch_handler/request_envelope/kserve.py#L27 https://github.com/pytorch/serve/blob/master/ts/torch_handler/request_envelope/kservev2.py#L102,L111

This is still voiding any use-case with batch_size > 1.

matej14086 commented 3 months ago

The main feature of torchserve is dynamic batching, especially if you have requests from multiple sources. It's a bummer that Kserve doesn't support that