Open Inference Protocol with nightly build not working

harshita-meena commented 4 months ago

🐛 Describe the bug

While trying to run load tests with latest merged changes on v2 Open inference protocol, I noticed that the example for mnist does not work in preprocessing step. https://github.com/pytorch/serve/pull/2609/files

Error logs

The server side showed error

Screen Shot 2024-02-20 at 4 46 38 PM

Installation instructions

ARG VERSION=latest-cpu
ARG IMAGE_NAME=pytorch/torchserve-nightly

from $IMAGE_NAME:$VERSION

USER root

RUN apt-get -y update
RUN apt-get install -y curl vim

# Installation steps to download model from GCP
# Downloading gcloud package
RUN curl https://dl.google.com/dl/cloudsdk/release/google-cloud-sdk.tar.gz > /tmp/google-cloud-sdk.tar.gz

# Installing the package
RUN mkdir -p /usr/local/gcloud \
  && tar -C /usr/local/gcloud -xvf /tmp/google-cloud-sdk.tar.gz \
  && /usr/local/gcloud/google-cloud-sdk/install.sh

# Adding the package path to local
ENV PATH $PATH:/usr/local/gcloud/google-cloud-sdk/bin

ENV TS_OPEN_INFERENCE_PROTOCOL oip

RUN pip install protobuf googleapis-common-protos grpcio loguru

COPY config.properties /home/model-server/config.properties
COPY mnist.mar /home/model-server/model-store/

copied model from gs://kfserving-examples/models/torchserve/image_classifier/v2/model-store/mnist.mar Built the docker file using docker build -f Dockerfile -t metadata . and brought it up locally Ran ghz load test tool with

ghz  --proto serve/frontend/server/src/main/resources/proto/open_inference_grpc.proto   --call org.pytorch.serve.grpc.openinference.GRPCInferenceService/ModelInfer --duration 300s --rps 1 --insecure localhost:79 -D ./serve/kubernetes/kserve/kf_request_json/v2/mnist/mnist_v2_tensor_grpc.json

Model Packaing

Used an existing packaged model mnist.mar at gs://kfserving-examples/models/torchserve/image_classifier/v2

config.properties

inference_address=http://0.0.0.0:8080 management_address=http://0.0.0.0:8081 metrics_address=http://0.0.0.0:8082 enable_metrics_api=true model_metrics_auto_detect=true metrics_mode=prometheus number_of_netty_threads=32 job_queue_size=1000 enable_envvars_config=true model_store=/home/model-server/model-store load_models=mnist.mar workflow_store=/home/model-server/wf-store

Versions

Environment headers

Torchserve branch:

**Warning: torchserve not installed .. torch-model-archiver==0.9.0

Python version: 3.7 (64-bit runtime) Python executable: /Users/hmeena/development/ml-platform-control-planes/venv/bin/python

Versions of relevant python libraries: numpy==1.21.6 requests==2.31.0 requests-oauthlib==1.3.1 torch-model-archiver==0.9.0 wheel==0.41.0 Warning: torch not present .. Warning: torchtext not present .. Warning: torchvision not present .. Warning: torchaudio not present ..

Java Version:

OS: Mac OSX 11.7.8 (x86_64) GCC version: N/A Clang version: 12.0.0 (clang-1200.0.32.29) CMake version: version 3.23.2

Versions of npm installed packages: **Warning: newman, newman-reporter-html markdown-link-check not installed...

Repro instructions

same as installation instruction

Possible Solution

I am unsure of how well the OIP is working with Torchserve at the moment. I tried a small ranker example and it fails in the post processing step where the worker crashes completely, it is not able to send response as ModelInferResponse.

agunapal commented 4 months ago

Hi @harshita-meena Thanks for reporting the issue.

Do you mind trying this script https://github.com/pytorch/serve/blob/master/kubernetes/kserve/tests/scripts/test_mnist.sh

We are running this nightly. https://github.com/pytorch/serve/actions/workflows/kserve_cpu_tests.yml

harshita-meena commented 4 months ago

Currently I am trying to setup the tests but they will probably not fail because the images used in oip kserve yamls are custom ones (http and grpc) and both do not refer to nightly ones though are part of the test_mnist.sh at lines 189 and 215.

harshita-meena commented 4 months ago

Still struggling to get the tests running, if you can approve the workflow for this PR. The primary reason I am trying to get this working is because I wanted to use Open Inference Protocol for a non-kserve deployment. Everything works till the worker dies after the post processing step. I was heavily relying on this because OIP provides a great generic metadata/inference API. If this doesn't work I will use the inference.proto instead.

agunapal commented 4 months ago

Hi @harshita-meena Thanks for the details. Checking with kserve regarding this. Will update

harshita-meena commented 4 months ago

You can reproduce the worker died issue, if you build the dockerfile part of this issue with config properties and create a new mnist.mar with a slightly modified handler for OIP specific requests. (attached a zip)

torch-model-archiver --model-name mnist --version 1.0 --serialized-file mnist_cnn.pt --model-file mnist.py --handler mnist_handler.py -r requirements.txt

mnist.zip

ghz --proto serve/frontend/server/src/main/resources/proto/open_inference_grpc.proto --call org.pytorch.serve.grpc.openinference.GRPCInferenceService/ModelInfer --duration 300s --rps 1 --insecure localhost:79 -D ./serve/kubernetes/kserve/kf_request_json/v2/mnist/mnist_v2_tensor_grpc.json

agunapal commented 3 months ago

Hi @harshita-meena Thanks! This is a new feature and there might be bugs. Will update when I repro it