pytorch / serve

Serve, optimize and scale PyTorch models in production
https://pytorch.org/serve/
Apache License 2.0
4.17k stars 850 forks source link

'503 Service Unavailable' for url 'http://0.0.0.0:8085/v1/models/mnist:predict' #2968

Open lswjkllc opened 7 months ago

lswjkllc commented 7 months ago

🐛 Describe the bug

This case is not working :https://kserve.github.io/website/0.11/modelserving/v1beta1/torchserve/#deploy-pytorch-model-with-v2-rest-protocol. The isvc object is ready when using v2 protocal with new scheme:

apiVersion: "serving.kserve.io/v1beta1"
kind: "InferenceService"
metadata:
  name: "torchserve-mnist-v2"
  labels:
    networking.kserve.io/visibility: cluster-local
spec:
  predictor:
    model:
      modelFormat:
        name: pytorch
      protocolVersion: v2
      storageUri: gs://kfserving-examples/models/torchserve/image_classifier/v2

But, the response always return error when sending a http request for inference:

{"error":"HTTPStatusError : {'code': 503, 'type': 'InternalServerException', 'message': 'Prediction failed'}, '503 Service Unavailable' for url 'http://0.0.0.0:8085/v1/models/mnist:predict'"}

The request:

curl -v -H 'Content-Type: application/json' http://127.0.0.1:8080/v2/models/mnist/infer -d @./mnist_v2_bytes.json

Before request, I use 'kubectl port-forward' expose the service:

kubectl port-forward svc/torchserve-mnist-v2-predictor 8080:80 --address=0.0.0.0

mnist_v2_bytes.json:

{
    "id": "d3b15cad-50a2-4eaf-80ce-8b0a428bd298",
    "inputs": [
        {
            "data": ["iVBORw0KGgoAAAANSUhEUgAAABwAAAAcCAAAAABXZoBIAAAA10lEQVR4nGNgGFhgy6xVdrCszBaLFN/mr28+/QOCr69DMCSnA8WvHti0acu/fx/10OS0X/975CDDw8DA1PDn/1pBVEmLf3+zocy2X/+8USXt/82Ds+/+m4sqeehfOpw97d9VFDmlO++t4JwQNMm6f6sZcEpee2+DR/I4A05J7tt4JJP+IUsu+ncRp6TxO9RAQJY0XvrvMAuypNNHuCTz8n+PzVEcy3DtqgiY1ptx6t8/ewY0yX9ntoDA63//Xs3hQpMMPPsPAv68qmDAAFKXwHIzMzCl6AoAxXp0QujtP+8AAAAASUVORK5CYII="],
            "datatype": "BYTES",
            "name": "312a4eb0-0ca7-4803-a101-a6d2c18486fe",
            "shape": [-1]
        }
    ]
}

Error logs

2024-02-27 14:45:39.492 37650 root INFO [timing():48] kserve.io.kserve.protocol.rest.v1_endpoints.predict 18.291365146636963, ['http_status:500', 'http_method:POST', 'time:wall'] 2024-02-27 14:45:39.493 37650 root INFO [timing():48] kserve.io.kserve.protocol.rest.v1_endpoints.predict 0.35211900000000007, ['http_status:500', 'http_method:POST', 'time:cpu'] 2024-02-27 14:45:39.494 37650 uvicorn.error ERROR [run_asgi():376] Exception in ASGI application Traceback (most recent call last): File "/Users/kust/Workspace/projects/bocloud/torchserve/.py38/lib/python3.8/site-packages/uvicorn/protocols/http/h11_impl.py", line 373, in run_asgi result = await app(self.scope, self.receive, self.send) File "/Users/kust/Workspace/projects/bocloud/torchserve/.py38/lib/python3.8/site-packages/uvicorn/middleware/proxy_headers.py", line 75, in call return await self.app(scope, receive, send) File "/Users/kust/Workspace/projects/bocloud/torchserve/.py38/lib/python3.8/site-packages/fastapi/applications.py", line 270, in call await super().call(scope, receive, send) File "/Users/kust/Workspace/projects/bocloud/torchserve/.py38/lib/python3.8/site-packages/starlette/applications.py", line 124, in call await self.middleware_stack(scope, receive, send) File "/Users/kust/Workspace/projects/bocloud/torchserve/.py38/lib/python3.8/site-packages/starlette/middleware/errors.py", line 184, in call raise exc File "/Users/kust/Workspace/projects/bocloud/torchserve/.py38/lib/python3.8/site-packages/starlette/middleware/errors.py", line 162, in call await self.app(scope, receive, _send) File "/Users/kust/Workspace/projects/bocloud/torchserve/.py38/lib/python3.8/site-packages/timing_asgi/middleware.py", line 70, in call await self.app(scope, receive, send_wrapper) File "/Users/kust/Workspace/projects/bocloud/torchserve/.py38/lib/python3.8/site-packages/starlette/middleware/exceptions.py", line 79, in call raise exc File "/Users/kust/Workspace/projects/bocloud/torchserve/.py38/lib/python3.8/site-packages/starlette/middleware/exceptions.py", line 68, in call await self.app(scope, receive, sender) File "/Users/kust/Workspace/projects/bocloud/torchserve/.py38/lib/python3.8/site-packages/fastapi/middleware/asyncexitstack.py", line 21, in call raise e File "/Users/kust/Workspace/projects/bocloud/torchserve/.py38/lib/python3.8/site-packages/fastapi/middleware/asyncexitstack.py", line 18, in call await self.app(scope, receive, send) File "/Users/kust/Workspace/projects/bocloud/torchserve/.py38/lib/python3.8/site-packages/starlette/routing.py", line 706, in call await route.handle(scope, receive, send) File "/Users/kust/Workspace/projects/bocloud/torchserve/.py38/lib/python3.8/site-packages/starlette/routing.py", line 276, in handle await self.app(scope, receive, send) File "/Users/kust/Workspace/projects/bocloud/torchserve/.py38/lib/python3.8/site-packages/starlette/routing.py", line 66, in app response = await func(request) File "/Users/kust/Workspace/projects/bocloud/torchserve/.py38/lib/python3.8/site-packages/fastapi/routing.py", line 235, in app raw_response = await run_endpoint_function( File "/Users/kust/Workspace/projects/bocloud/torchserve/.py38/lib/python3.8/site-packages/fastapi/routing.py", line 161, in run_endpoint_function return await dependant.call(**values) File "/Users/kust/Workspace/projects/bocloud/torchserve/.py38/lib/python3.8/site-packages/kserve/protocol/rest/v1_endpoints.py", line 69, in predict response, response_headers = await self.dataplane.infer(model_name=model_name, body=body, headers=headers) File "/Users/kust/Workspace/projects/bocloud/torchserve/.py38/lib/python3.8/site-packages/kserve/protocol/dataplane.py", line 276, in infer response = await model(body, headers=headers) File "/Users/kust/Workspace/projects/bocloud/torchserve/.py38/lib/python3.8/site-packages/kserve/model.py", line 116, in call response = (await self.predict(payload, headers)) if inspect.iscoroutinefunction(self.predict) \ File "/Users/kust/Workspace/projects/bocloud/torchserve/.py38/lib/python3.8/site-packages/kserve/model.py", line 319, in predict return await self._http_predict(payload, headers) File "/Users/kust/Workspace/projects/bocloud/torchserve/.py38/lib/python3.8/site-packages/kserve/model.py", line 286, in _http_predict raise HTTPStatusError(message, request=response.request, response=response) httpx.HTTPStatusError: {'code': 503, 'type': 'InternalServerException', 'message': 'Prediction failed'}, '503 Service Unavailable' for url 'http://0.0.0.0:8085/v1/models/mnist:predict'

Installation instructions

KServe Version: 0.11 Kubernetes version: 1.23.0 OS (e.g. from /etc/os-release): centos 7.9

Model Packaing

gs://kfserving-examples/models/torchserve/image_classifier/v2

config.properties

No response

Versions

unknown

Repro instructions

unknown

Possible Solution

Expected Output:

{"id": "d3b15cad-50a2-4eaf-80ce-8b0a428bd298", "model_name": "mnist", "model_version": "1.0", "outputs": [{"name": "predict", "shape": [], "datatype": "INT64", "data": [1]}]}
agunapal commented 7 months ago

Hi @lswjkllc Please try this example

https://github.com/pytorch/serve/blob/master/kubernetes/kserve/examples/mnist/MNIST.md

sgaist commented 7 months ago

Hi @lswjkllc,

While it might not be directly related to your 503 error but since you mention kserve 0.11, is it 0.11.0 or 0.11.1 ? If the former, you should add:

       env:
         - name: PROTOCOL_VERSION
           value: v2

to your predictor definition to ensure v2 is used to serve your model.