Called inference v2 infer api, Got httpx.ConnectError: All connection attempts failed

What steps did you take and what happened: [A clear and concise description of what the bug is.]

'./build_image.sh' build a torchserve-kfs docker image with Torchserve for kserve, Dockerfile


# syntax = docker/dockerfile:experimental
#
# Following comments have been shamelessly copied from https://github.com/pytorch/pytorch/blob/master/Dockerfile
# 
# NOTE: To build this you will need a docker version > 18.06 with
#       experimental enabled and DOCKER_BUILDKIT=1
#
#       If you do not use buildkit you are not going to have a good time
#
#       For reference: 
#           https://docs.docker.com/develop/develop-images/build_enhancements

ARG BASE_IMAGE=pytorch/torchserve:latest FROM ${BASE_IMAGE}

USER root RUN apt-get update -y && apt-get install -y curl wget iputils-ping vim RUN pip install --upgrade pip

COPY requirements.txt requirements.txt

RUN pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple

RUN pip install -r requirements.txt

COPY dockerd-entrypoint.sh /usr/local/bin/dockerd-entrypoint.sh RUN chmod +x /usr/local/bin/dockerd-entrypoint.sh COPY kserve_wrapper kserve_wrapper COPY config.properties config.properties

USER model-server

ENTRYPOINT ["/usr/local/bin/dockerd-entrypoint.sh"]

build docker image:

./build_image.sh -t swr.cn-north-4.myhuaweicloud.com/${MY_DOCKER_REPOSITORY}/torchserve-kfs:dev


3.  Deployed a pytorch mnist model with [TorchServe InferenceService](https://kserve.github.io/website/0.10/modelserving/v1beta1/torchserve/ ) 
4. running inference, startup log kubectl logs torchserve-predictor-default-00001-deployment-64b87f58b5-6kwzf

Defaulted container "kserve-container" out of: kserve-container, queue-proxy, agent, storage-initializer (init) 2023-05-29 00:46:07.064 13 root INFO [parse_config():68] Wrapper : Model names ['mnist'], inference address http//0.0.0.0:8085, management address http://0.0.0.0:8085, model store /mnt/models/model-store 2023-05-29 00:46:07.064 13 root INFO [init():48] Predict URL set to 0.0.0.0:8085 2023-05-29 00:46:07.065 13 root INFO [init():50] Explain URL set to 0.0.0.0:8085 2023-05-29 00:46:07.065 13 root INFO [download():63] Copying contents of /mnt/models/model-store to local 2023-05-29 00:46:07.065 13 root INFO [init():27] TSModelRepo is initialized 2023-05-29 00:46:07.066 13 root INFO [register_model():187] Registering model: mnist 2023-05-29 00:46:07.066 13 root INFO [start():129] Setting max asyncio worker threads as 16 2023-05-29 00:46:07.066 13 root INFO [serve():139] Starting uvicorn with 1 workers 2023-05-29 00:46:07.090 13 uvicorn.error INFO [serve():84] Started server process [13] 2023-05-29 00:46:07.090 13 uvicorn.error INFO [startup():45] Waiting for application startup. 2023-05-29 00:46:07.091 13 root INFO [start():62] Starting gRPC server on [::]:8081 2023-05-29 00:46:07 DEBUG [timing_asgi.middleware:40] ASGI scope of type lifespan is not supported yet 2023-05-29 00:46:07.092 13 uvicorn.error INFO [startup():59] Application startup complete.

kubectl get inferenceservice torchserve

NAME URL READY PREV LATEST PREVROLLEDOUTREVISION LATESTREADYREVISION AGE torchserve http://torchserve.kserve-test.example.com True 100 torchserve-predictor-default-00001 38m

5.  call inference api /v2/models/mnist/infer

export SERVICE_HOSTNAME=$(kubectl get inferenceservice torchserve -o jsonpath='{.status.url}' | cut -d "/" -f 3) export INGRESS_HOST=$(kubectl get po -l istio=ingressgateway -n istio-system -o jsonpath='{.items[0].status.hostIP}') export INGRESS_PORT=$(kubectl -n istio-system get service istio-ingressgateway -o jsonpath='{.spec.ports[?(@.name=="http2")].nodePort}')

curl -v -H "Host: ${SERVICE_HOSTNAME}" \ -H 'Content-Type: application/json' \ http://${INGRESS_HOST}:${INGRESS_PORT}/v2/models/mnist/infer \ -d @./mnist_v2_bytes.json

curl output:

Trying 192.168.160.3...
TCP_NODELAY set
Connected to 192.168.160.3 (192.168.160.3) port 31436 (#0)

POST /v2/models/mnist/infer HTTP/1.1 Host: torchserve.kserve-test.example.com User-Agent: curl/7.58.0 Accept: / Content-Type: application/json Content-Length: 611
upload completely sent off: 611 out of 611 bytes < HTTP/1.1 500 Internal Server Error < content-length: 57 < content-type: application/json < date: Mon, 29 May 2023 01:09:09 GMT < server: istio-envoy < x-envoy-upstream-service-time: 18 <
Connection #0 to host 192.168.160.3 left intact {"error":"ConnectError : All connection attempts failed"}

kubectl logs torchserve-predictor-default-00001-deployment-64b87f58b5-6kwzf

2023-05-29 00:46:17.906 13 root INFO [timing():48] kserve.io.kserve.protocol.rest.v2_endpoints.infer 0.05085945129394531, ['http_status:500', 'http_method:POST', 'time:wall']
2023-05-29 00:46:17.906 13 root INFO [timing():48] kserve.io.kserve.protocol.rest.v2_endpoints.infer 0.04771100000000006, ['http_status:500', 'http_method:POST', 'time:cpu']
2023-05-29 00:46:17.906 13 uvicorn.error ERROR [run_asgi():376] Exception in ASGI application
Traceback (most recent call last):
  File "/home/venv/lib/python3.9/site-packages/anyio/_core/_sockets.py", line 164, in try_connect
    stream = await asynclib.connect_tcp(remote_host, remote_port, local_address)
  File "/home/venv/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 1691, in connect_tcp
    await get_running_loop().create_connection(
  File "/usr/lib/python3.9/asyncio/base_events.py", line 1065, in create_connection
    raise exceptions[0]
  File "/usr/lib/python3.9/asyncio/base_events.py", line 1050, in create_connection
    sock = await self._connect_sock(
  File "/usr/lib/python3.9/asyncio/base_events.py", line 961, in _connect_sock
    await self.sock_connect(sock, address)
  File "/usr/lib/python3.9/asyncio/selector_events.py", line 500, in sock_connect
    return await fut
  File "/usr/lib/python3.9/asyncio/selector_events.py", line 535, in _sock_connect_cb
    raise OSError(err, f'Connect call failed {address}')
ConnectionRefusedError: [Errno 111] Connect call failed ('0.0.0.0', 8085)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/venv/lib/python3.9/site-packages/httpcore/_exceptions.py", line 10, in map_exceptions
    yield
  File "/home/venv/lib/python3.9/site-packages/httpcore/backends/asyncio.py", line 111, in connect_tcp
    stream: anyio.abc.ByteStream = await anyio.connect_tcp(
  File "/home/venv/lib/python3.9/site-packages/anyio/_core/_sockets.py", line 222, in connect_tcp
    raise OSError("All connection attempts failed") from cause
OSError: All connection attempts failed

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/venv/lib/python3.9/site-packages/httpx/_transports/default.py", line 60, in map_httpcore_exceptions
    yield
  File "/home/venv/lib/python3.9/site-packages/httpx/_transports/default.py", line 353, in handle_async_request
    resp = await self._pool.handle_async_request(req)
  File "/home/venv/lib/python3.9/site-packages/httpcore/_async/connection_pool.py", line 253, in handle_async_request
    raise exc
  File "/home/venv/lib/python3.9/site-packages/httpcore/_async/connection_pool.py", line 237, in handle_async_request
    response = await connection.handle_async_request(request)
  File "/home/venv/lib/python3.9/site-packages/httpcore/_async/connection.py", line 86, in handle_async_request
    raise exc
  File "/home/venv/lib/python3.9/site-packages/httpcore/_async/connection.py", line 63, in handle_async_request
    stream = await self._connect(request)
  File "/home/venv/lib/python3.9/site-packages/httpcore/_async/connection.py", line 111, in _connect
    stream = await self._network_backend.connect_tcp(**kwargs)
  File "/home/venv/lib/python3.9/site-packages/httpcore/backends/auto.py", line 29, in connect_tcp
    return await self._backend.connect_tcp(
  File "/home/venv/lib/python3.9/site-packages/httpcore/backends/asyncio.py", line 111, in connect_tcp
    stream: anyio.abc.ByteStream = await anyio.connect_tcp(
  File "/usr/lib/python3.9/contextlib.py", line 137, in __exit__
    self.gen.throw(typ, value, traceback)
  File "/home/venv/lib/python3.9/site-packages/httpcore/_exceptions.py", line 14, in map_exceptions
    raise to_exc(exc)
httpcore.ConnectError: All connection attempts failed

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/venv/lib/python3.9/site-packages/uvicorn/protocols/http/h11_impl.py", line 373, in run_asgi
    result = await app(self.scope, self.receive, self.send)
  File "/home/venv/lib/python3.9/site-packages/uvicorn/middleware/proxy_headers.py", line 75, in __call__
    return await self.app(scope, receive, send)
  File "/home/venv/lib/python3.9/site-packages/fastapi/applications.py", line 270, in __call__
    await super().__call__(scope, receive, send)
  File "/home/venv/lib/python3.9/site-packages/starlette/applications.py", line 124, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/home/venv/lib/python3.9/site-packages/starlette/middleware/errors.py", line 184, in __call__
    raise exc
  File "/home/venv/lib/python3.9/site-packages/starlette/middleware/errors.py", line 162, in __call__
    await self.app(scope, receive, _send)
  File "/home/venv/lib/python3.9/site-packages/timing_asgi/middleware.py", line 68, in __call__
    await self.app(scope, receive, send_wrapper)
  File "/home/venv/lib/python3.9/site-packages/starlette/middleware/exceptions.py", line 79, in __call__
    raise exc
  File "/home/venv/lib/python3.9/site-packages/starlette/middleware/exceptions.py", line 68, in __call__
    await self.app(scope, receive, sender)
  File "/home/venv/lib/python3.9/site-packages/fastapi/middleware/asyncexitstack.py", line 21, in __call__
    raise e
  File "/home/venv/lib/python3.9/site-packages/fastapi/middleware/asyncexitstack.py", line 18, in __call__
    await self.app(scope, receive, send)
  File "/home/venv/lib/python3.9/site-packages/starlette/routing.py", line 706, in __call__
    await route.handle(scope, receive, send)
  File "/home/venv/lib/python3.9/site-packages/starlette/routing.py", line 276, in handle
    await self.app(scope, receive, send)
  File "/home/venv/lib/python3.9/site-packages/starlette/routing.py", line 66, in app
    response = await func(request)
  File "/home/venv/lib/python3.9/site-packages/fastapi/routing.py", line 235, in app
    raw_response = await run_endpoint_function(
  File "/home/venv/lib/python3.9/site-packages/fastapi/routing.py", line 161, in run_endpoint_function
    return await dependant.call(**values)
  File "/home/venv/lib/python3.9/site-packages/kserve/protocol/rest/v2_endpoints.py", line 130, in infer
    response, response_headers = await self.dataplane.infer(
  File "/home/venv/lib/python3.9/site-packages/kserve/protocol/dataplane.py", line 276, in infer
    response = await model(body, headers=headers)
  File "/home/venv/lib/python3.9/site-packages/kserve/model.py", line 116, in __call__
    response = (await self.predict(payload, headers)) if inspect.iscoroutinefunction(self.predict) \
  File "/home/venv/lib/python3.9/site-packages/kserve/model.py", line 319, in predict
    return await self._http_predict(payload, headers)
  File "/home/venv/lib/python3.9/site-packages/kserve/model.py", line 270, in _http_predict
    response = await self._http_client.post(
  File "/home/venv/lib/python3.9/site-packages/httpx/_client.py", line 1845, in post
    return await self.request(
  File "/home/venv/lib/python3.9/site-packages/httpx/_client.py", line 1530, in request
    return await self.send(request, auth=auth, follow_redirects=follow_redirects)
  File "/home/venv/lib/python3.9/site-packages/httpx/_client.py", line 1617, in send
    response = await self._send_handling_auth(
  File "/home/venv/lib/python3.9/site-packages/httpx/_client.py", line 1645, in _send_handling_auth
    response = await self._send_handling_redirects(
  File "/home/venv/lib/python3.9/site-packages/httpx/_client.py", line 1682, in _send_handling_redirects
    response = await self._send_single_request(request)
  File "/home/venv/lib/python3.9/site-packages/httpx/_client.py", line 1719, in _send_single_request
    response = await transport.handle_async_request(request)
  File "/home/venv/lib/python3.9/site-packages/httpx/_transports/default.py", line 353, in handle_async_request
    resp = await self._pool.handle_async_request(req)
  File "/usr/lib/python3.9/contextlib.py", line 137, in __exit__
    self.gen.throw(typ, value, traceback)
  File "/home/venv/lib/python3.9/site-packages/httpx/_transports/default.py", line 77, in map_httpcore_exceptions
    raise mapped_exc(message) from exc
httpx.ConnectError: All connection attempts failed

What did you expect to happen: Get predict result, eg:

{"id": "d3b15cad-50a2-4eaf-80ce-8b0a428bd298", "model_name": "mnist", "model_version": "1.0", "outputs": [{"name": "predict", "shape": [], "datatype": "INT64", "data": [1]}]}

What's the InferenceService yaml: [To help us debug please run kubectl get isvc $name -n $namespace -oyaml and paste the output]

apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
  annotations:
    autoscaling.knative.dev/target: "10"
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"serving.kserve.io/v1beta1","kind":"InferenceService","metadata":{"annotations":{"autoscaling.knative.dev/target":"10"},"name":"torchserve","namespace":"kserve-test"},"spec":{"predictor":{"imagePullSecrets":[{"name":"huaweicloud"}],"logger":{"mode":"all","url":"http://message-dumper.default/"},"model":{"modelFormat":{"name":"pytorch"},"protocolVersion":"v2","runtime":"torchserve-runtime","storageUri":"s3://mlflow/train_valid/output/6024661e-6d33-4ea7-af5a-d1e30ffa1479"},"serviceAccountName":"sa"}}}
  creationTimestamp: "2023-05-29T00:45:34Z"
  finalizers:
  - inferenceservice.finalizers
  generation: 1
  name: torchserve
  namespace: kserve-test
  resourceVersion: "12068534"
  uid: 150cf9b1-baf3-43a7-86a5-36995fe409d3
spec:
  predictor:
    imagePullSecrets:
    - name: huaweicloud
    logger:
      mode: all
      url: http://message-dumper.default/
    model:
      modelFormat:
        name: pytorch
      name: ""
      protocolVersion: v2
      resources: {}
      runtime: torchserve-runtime
      storageUri: s3://mlflow/train_valid/output/6024661e-6d33-4ea7-af5a-d1e30ffa1479
    serviceAccountName: sa
status:
  address:
    url: http://torchserve.kserve-test.svc.cluster.local/v2/models/torchserve/infer
  components:
    predictor:
      address:
        url: http://torchserve-predictor-default.kserve-test.svc.cluster.local
      latestCreatedRevision: torchserve-predictor-default-00001
      latestReadyRevision: torchserve-predictor-default-00001
      latestRolledoutRevision: torchserve-predictor-default-00001
      traffic:
      - latestRevision: true
        percent: 100
        revisionName: torchserve-predictor-default-00001
      url: http://torchserve-predictor-default.kserve-test.example.com
  conditions:
  - lastTransitionTime: "2023-05-29T00:46:07Z"
    status: "True"
    type: IngressReady
  - lastTransitionTime: "2023-05-29T00:46:07Z"
    severity: Info
    status: "True"
    type: PredictorConfigurationReady
  - lastTransitionTime: "2023-05-29T00:46:07Z"
    status: "True"
    type: PredictorReady
  - lastTransitionTime: "2023-05-29T00:46:07Z"
    severity: Info
    status: "True"
    type: PredictorRouteReady
  - lastTransitionTime: "2023-05-29T00:46:07Z"
    status: "True"
    type: Ready
  modelStatus:
    copies:
      failedCopies: 0
      totalCopies: 2
    states:
      activeModelState: Loaded
      targetModelState: Loaded
    transitionStatus: UpToDate
  observedGeneration: 1
  url: http://torchserve.kserve-test.example.com

Anything else you would like to add: [Miscellaneous information that will assist in solving the issue.]

config.properties

inference_address=http://0.0.0.0:8085
management_address=http://0.0.0.0:8085
metrics_address=http://0.0.0.0:8082
enable_envvars_config=true
install_py_dep_per_model=true
enable_metrics_api=false
service_envelope=kservev2
metrics_mode=prometheus
NUM_WORKERS=1
number_of_netty_threads=4
job_queue_size=10
model_store=/mnt/models/model-store
model_snapshot={"name":"startup.cfg","modelCount":1,"models":{"mnist":{"1.0":{"defaultVersion":true,"marName":"mnist.mar","minWorkers":1,"maxWorkers":5,"batchSize":1,"maxBatchDelay":5000,"responseTimeout":120}}}}

ServingRuntime yaml:

Name:         torchserve-runtime
Namespace:    kserve-test
Labels:       <none>
Annotations:  <none>
API Version:  serving.kserve.io/v1alpha1
Kind:         ServingRuntime
Metadata:
Creation Timestamp:  2023-05-29T00:45:33Z
Generation:          1
Managed Fields:
API Version:  serving.kserve.io/v1alpha1
Fields Type:  FieldsV1
fieldsV1:
  f:metadata:
    f:annotations:
      .:
      f:kubectl.kubernetes.io/last-applied-configuration:
  f:spec:
    .:
    f:containers:
    f:protocolVersions:
    f:supportedModelFormats:
Manager:         kubectl-client-side-apply
Operation:       Update
Time:            2023-05-29T00:45:33Z
Resource Version:  12068140
UID:               3393d161-0564-4fe3-85cf-067fbe02c9db
Spec:
Containers:
Image:  swr.cn-north-4.myhuaweicloud.com/${MY_DOCKER_REPOSITORY}/torchserve-kfs:dev
Name:   kserve-container
Protocol Versions:
v2
Supported Model Formats:
Auto Select:  true
Name:         pytorch
Version:      1
Events:           <none>

mnist_v2_bytes.json

{
"inputs": [
    {
        "data": ["iVBORw0KGgoAAAANSUhEUgAAABwAAAAcCAAAAABXZoBIAAAA10lEQVR4nGNgGFhgy6xVdrCszBaLFN/mr28+/QOCr69DMCSnA8WvHti0acu/fx/10OS0X/975CDDw8DA1PDn/1pBVEmLf3+zocy2X/+8USXt/82Ds+/+m4sqeehfOpw97d9VFDmlO++t4JwQNMm6f6sZcEpee2+DR/I4A05J7tt4JJP+IUsu+ncRp6TxO9RAQJY0XvrvMAuypNNHuCTz8n+PzVEcy3DtqgiY1ptx6t8/ewY0yX9ntoDA63//Xs3hQpMMPPsPAv68qmDAAFKXwHIzMzCl6AoAxXp0QujtP+8AAAAASUVORK5CYII="],
        "datatype": "BYTES",
        "name": "e8d5afed-0a56-4deb-ac9c-352663f51b93",
        "shape": [-1]
    }
]
}

Environment:

Istio Version: 1.15.0
Knative Version: knative-v1.7.0
KServe Version: v0.10.1
Kubeflow version:
Cloud Environment:[k8s_istio/istio_dex/gcp_basic_auth/gcp_iap/aws/aws_cognito/ibm]
Minikube/Kind version: kind v0.18.0 go1.20.2 linux/amd64
Kubernetes version: (use kubectl version): v1.25.0
OS (e.g. from /etc/os-release): Ubuntu 18.04.6 LTS (Bionic Beaver)

kserve / kserve

Called inference v2 infer api, Got httpx.ConnectError: All connection attempts failed #2951