kserve / kserve

Standardized Serverless ML Inference Platform on Kubernetes
https://kserve.github.io/website/
Apache License 2.0
3.4k stars 1.02k forks source link

transformer with v2 + grpc is not work #3790

Closed todtjs92 closed 1 week ago

todtjs92 commented 1 month ago

/kind bug

What steps did you take and what happened:

iam trying to practice transformer example below: https://kserve.github.io/website/latest/modelserving/v1beta1/transformer/torchserve_image_transformer/#create-inferenceservice

I only change the model pytorch to lgbm and it works when i use v1 + rest protocol. But when i tried to practice {Deploy the InferenceService calling Predictor with gRPC protocol} example , error is occured. Here are my files .

InferenceService

apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
  name: iris-lgbm
  namespace: datascience
spec:
  predictor:
    ports:
    - containerPort: 9000
      name: h2c
      protocol: TCP
    serviceAccountName: sa
    protocolVersion: v2
    model:
      modelFormat:
        name: lightgbm
      runtime: kserve-lgbserver
      storageUri: xxxxxxxx
      resources:
        limits:
          cpu: 1
          memory: 2Gi
        requests:
          cpu: 1
          memory: 2Gi
  transformer:
    containers:
      - image:  xxxxxxx
        name: kserve-container
        command:
          - "python"
          - "-m"
          - "transformer"
        args:
          - --model_name
          - iris-lgbm
          - --protocol
          - grpc-v2

transformer.py

import argparse
import polars
import numpy as np
from typing import Dict , Union

from kserve import (
    Model,
    ModelServer,
    model_server,
    InferInput,
    InferRequest,
    InferResponse,
    logging,
)
from kserve.model import PredictorProtocol, PredictorConfig

class Transformer(Model):

    def __init__(self, name: str, predictor_host: str, protocol: str, headers: Dict[str, str] = None):
        super().__init__(name)
        self.predictor_host = predictor_host
        self.protocol = protocol
        self.ready = True

    def preprocess(self, request: InferRequest, headers: Dict[str, str] = None) -> InferRequest:
        """
          v2
                {
          "inputs": [
            {
              "data": [[1.2, 2.3, 3.5, 4.1], [4.2, 0.3, 2.5, 5.1]],
              "shape": [-1, 4],
              "name": "test",
              "datatype": "FP32"
            }
          ]
        }
        """
        print("protocl is =",self.protocol)
        input_tensors = request.inputs[0].data
        input_tensors = [[x+1 for x in t] for t in input_tensors]
        input_tensors = np.asarray(input_tensors , dtype = np.float32)

        infer_inputs = [
                InferInput(
                    name="INPUT__0",
                    datatype="FP32",
                    shape=list(input_tensors.shape),
                    data=input_tensors,
                )
            ]
        infer_request = InferRequest(model_name=self.name, infer_inputs=infer_inputs)

        return infer_request

parser = argparse.ArgumentParser(parents=[model_server.parser])
args, _ = parser.parse_known_args()

if __name__ == "__main__":
    model = Transformer(args.model_name, predictor_host=args.predictor_host,
                             protocol=args.protocol)
    ModelServer(workers=1).start([model])

dockerfile

FROM python:3.11-slim

COPY . .

RUN python -m pip install -r requirements.txt

RUN useradd kserve -m -u 1000 -d /home/kserve

USER 1000

ENTRYPOINT [ "python","-m","transformer","--protocol","grpc-v2" ]

Error

error is too long the point is here .

details = "Received http2 header with status: 404"  debug_error_string = "UNKNOWN:Error received from peer  {created_time:"2024-07-13T13:15:35.058449156+00:00", grpc_status:12, grpc_message:"Received http2 header with status: 404"}">protocl is = grpc-v2

and the weird thing is i set the port 9000 but the log said it start grpc server on port 8081

All log

2024-07-13 13:14:56.014 1 kserve INFO [model_server.py:register_model():384] Registering model: iris-lgbm2024-07-13 13:14:56.015 1 kserve INFO [model_server.py:start():254] Setting max asyncio worker threads as 52024-07-13 13:14:56.015 1 kserve INFO [model_server.py:serve():260] Starting uvicorn with 1 workers2024-07-13 13:14:56.060 uvicorn.error INFO:     Started server process [1]2024-07-13 13:14:56.060 uvicorn.error INFO:     Waiting for application startup.2024-07-13 13:14:56.063 1 kserve INFO [server.py:start():63] Starting gRPC server on [::]:80812024-07-13 13:14:56.063 uvicorn.error INFO:     Application startup complete.2024-07-13 13:14:56.063 uvicorn.error INFO:     Uvicorn running on http://0.0.0.0:8080 (Press CTRL+C to quit)2024-07-13 13:15:35.058 kserve.trace kserve.io.kserve.protocol.rest.v2_endpoints.infer: 0.0122404098510742192024-07-13 13:15:35.058 kserve.trace kserve.io.kserve.protocol.rest.v2_endpoints.infer: 0.0034659999999995252024-07-13 13:15:35.058 1 kserve ERROR [errors.py:generic_exception_handler():111] Exception:

Traceback (most recent call last):  File "/usr/local/lib/python3.11/site-packages/starlette/middleware/errors.py", line 164, in __call__    await self.app(scope, receive, _send)  File "/usr/local/lib/python3.11/site-packages/timing_asgi/middleware.py", line 70, in __call__    await self.app(scope, receive, send_wrapper)  File "/usr/local/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 62, in __call__    await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)  File "/usr/local/lib/python3.11/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app    raise exc  File "/usr/local/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app    await app(scope, receive, sender)  File "/usr/local/lib/python3.11/site-packages/starlette/routing.py", line 758, in __call__    await self.middleware_stack(scope, receive, send)  File "/usr/local/lib/python3.11/site-packages/starlette/routing.py", line 778, in app    await route.handle(scope, receive, send)  File "/usr/local/lib/python3.11/site-packages/starlette/routing.py", line 299, in handle    await self.app(scope, receive, send)  File "/usr/local/lib/python3.11/site-packages/starlette/routing.py", line 79, in app    await wrap_app_handling_exceptions(app, request)(scope, receive, send)  File "/usr/local/lib/python3.11/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app    raise exc  File "/usr/local/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app    await app(scope, receive, sender)  File "/usr/local/lib/python3.11/site-packages/starlette/routing.py", line 74, in app    response = await func(request)               ^^^^^^^^^^^^^^^^^^^  File "/usr/local/lib/python3.11/site-packages/fastapi/routing.py", line 299, in app    raise e  File "/usr/local/lib/python3.11/site-packages/fastapi/routing.py", line 294, in app    raw_response = await run_endpoint_function(                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^  File "/usr/local/lib/python3.11/site-packages/fastapi/routing.py", line 191, in run_endpoint_function    return await dependant.call(**values)

 File "/usr/local/lib/python3.11/site-packages/kserve/protocol/rest/v2_endpoints.py", line 169, in infer    response, response_headers = await self.dataplane.infer(                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^  File "/usr/local/lib/python3.11/site-packages/kserve/protocol/dataplane.py", line 343, in infer    response = await model(request, headers=headers)               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^  File "/usr/local/lib/python3.11/site-packages/kserve/model.py", line 201, in __call__    (await self.predict(payload, headers))     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^  File "/usr/local/lib/python3.11/site-packages/kserve/model.py", line 419, in predict    res = await self._grpc_predict(payload, headers)          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^  File "/usr/local/lib/python3.11/site-packages/kserve/model.py", line 387, in _grpc_predict    async_result = await self._grpc_client.ModelInfer(                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^  File "/usr/local/lib/python3.11/site-packages/grpc/aio/_call.py", line 318, in __await__    raise _create_rpc_error(grpc.aio._call.AioRpcError: <AioRpcError of RPC that terminated with:    status = StatusCode.UNIMPLEMENTED   details = "Received http2 header with status: 404"  debug_error_string = "UNKNOWN:Error received from peer  {created_time:"2024-07-13T13:15:35.058449156+00:00", grpc_status:12, grpc_message:"Received http2 header with status: 404"}">protocl is = grpc-v2

What did you expect to happen: request must be success

What's the InferenceService yaml: [To help us debug please run kubectl get isvc $name -n $namespace -oyaml and paste the output] Above!

Anything else you would like to add: [Miscellaneous information that will assist in solving the issue.] Another weird thing is

Environment:

todtjs92 commented 1 month ago

Is it not possible to use grpc-v2 in kserve-lgbserver?

there are only v1 and v2 on this page: https://github.com/kserve/kserve/blob/master/config/runtimes/kserve-lgbserver.yaml

sivanantha321 commented 1 month ago

@todtjs92 You should pass the argument --grpc_port=9000 to the model server to change the port otherwise grpc server will listen on 8081

LOADBC commented 1 month ago

@todtjs92 Ensure that your service is correctly routing the traffic to the right endpoints. The 404 error indicates that the endpoint might not be found or correctly configured. Ensure that the gRPC server is correctly configured to run on the expected port. By default, it seems to start on port 8081. Update the command in your Dockerfile and InferenceService YAML to ensure that the server is listening on the correct port.

todtjs92 commented 2 weeks ago

@sivanantha321 @LOADBC Thanks all!! It seems like i set the wrong port for grpc... .I'll try to fix it. Thank you so much!..