Setup TF Serving based deployment

deep-diver commented 2 years ago

In this new feature, the following works are expected

~~Update the notebook~~ Create a new notebook with the TF Serving prototype based on both gRPC(Ref) and RestAPI(Ref).
~~Update the notebook~~ Update the newly created notebook to check the %%timeit on the TF Serving server locally.
Build/Commit docker image based on TF Serving base image using this method.
Deploy the built docker image on GKE cluster
Check the deployed model's performance with a various scenarios (maybe the same ones applied to ONNX+FastAPI scenarios)

sayakpaul commented 2 years ago

@deep-diver I think we should create a separate notebook for TF-serving.

Deploy the built docker image on GKE cluster

Would be great to automate it using GitHub Actions to follow the theme of this repository.

Check the deployed model's performance with a various scenarios (maybe the same ones applied to ONNX+FastAPI scenarios)

100% agreed.

deep-diver commented 2 years ago

@sayakpaul

I think we should create a separate notebook for TF-serving.

You think so? Let me create a new notebook, and let's see what's better after that then :)

Would be great to automate it using GitHub Actions to follow the theme of this repository.

Yeah I totally agree. I will probably create a new Github Action yaml for this one.

Also, the issue is updated according to our discussion 👍🏼

sayakpaul commented 2 years ago

Alright. Thank you.

deep-diver commented 2 years ago

Steps to build and run tf serving docker image

Untar tf model

$ wget https://github.com/sayakpaul/ml-deployment-k8s-fastapi/releases/download/v1.0.0/resnet50_w_preprocessing_tf.tar.gz
$ tar -xvf resnet50_w_preprocessing_tf.tar.gz
$ MODEL_NAME=resnet
$ mkdir -p $MODEL_NAME/1
$ mv resnet50_w_preprocessing_tf/* $MODEL_NAME/1

Run the base tf serving image

$ docker run -d --name serving_base tensorflow/serving

Copy the model into the running tf serving image

$ docker cp $MODEL_NAME serving_base:/models/$MODEL_NAME

Commit the change and build a new docker image

$ PROJECT_ID=...
$ NEW_IMAGE_NAME=tfs-$MODEL_NAME:latest
$ NEW_IMAGE_TAG=gcr.io/$PROJECT_ID/$NEW_IMAGE_NAME
$ docker commit --change "ENV MODEL_NAME $MODEL_NAME" serving_base $NEW_IMAGE_TAG

Remove the base image

$ docker kill serving_base
$ docker rm serving_base

Run the new docker image by exposing two ports(gRPC and RestAPI) if you want to run this locally
```
$ docker run -p 8501:8501 -p 8500:8500 $NEW_IMAGE_TAG
```

or in k8s' Deployment.yaml

  containers:
  - image: ...
    ports:
    - containerPort: 8500
      name: gRPC
    - containerPort: 8501
      name: RestAPI

@sayakpaul I couldn't find a way to create a Dockerfile for this process, but it can be managed in GitHub Action.

sayakpaul commented 2 years ago

Yup, that is what I would have done too.

Run the new docker image by exposing two ports(gRPC and RestAPI)

Maybe change it to "Locally run the ..."?

deep-diver commented 2 years ago

@sayakpaul updated :)

deep-diver commented 2 years ago

tested on GKE cluster. the next step is to run a set of experiments with Locust

one minor thing

if we want to test gRPC request/response, then we have to setup Ingress since gRPC is based on HTTPs connection, and LoadBalancer doesn't support HTTPs as far as I know

sayakpaul commented 2 years ago

Let's maybe hold off from the gRPC load test for now 'cause we already have a comprehensive set of experiments. What do you think?

deep-diver commented 2 years ago

@sayakpaul sure!

gRPC/proto is a much faster solution than RestAPI AFAIK(Ref).

If we see the performance of RestAPI setup on TF Serving is somewhat similar to what we have done with FastAPI server, I think it is worth to try out gRPC. Otherwise, I guess we don't need gRPC experiments

sayakpaul commented 2 years ago

Right. Then let's try it out. Let me also check back with my colleague regarding how they perform load tests with our gRPC microservices (at Carted) because we have a similar deployment workflow.

sayakpaul commented 2 years ago

It now has a standalone repository: https://github.com/deep-diver/ml-deployment-k8s-tfserving

sayakpaul / ml-deployment-k8s-fastapi

Setup TF Serving based deployment #33