sayakpaul / ml-deployment-k8s-fastapi

This project shows how to serve an ONNX-optimized image classification model as a web service with FastAPI, Docker, and Kubernetes.
https://medium.com/google-developer-experts/load-testing-tensorflow-serving-and-fastapi-on-gke-411bc14d96b2
Apache License 2.0
198 stars 36 forks source link

Setup TF Serving based deployment #33

Closed deep-diver closed 2 years ago

deep-diver commented 2 years ago

In this new feature, the following works are expected

sayakpaul commented 2 years ago

@deep-diver I think we should create a separate notebook for TF-serving.

Deploy the built docker image on GKE cluster

Would be great to automate it using GitHub Actions to follow the theme of this repository.

Check the deployed model's performance with a various scenarios (maybe the same ones applied to ONNX+FastAPI scenarios)

100% agreed.

deep-diver commented 2 years ago

@sayakpaul

I think we should create a separate notebook for TF-serving.

You think so? Let me create a new notebook, and let's see what's better after that then :)

Would be great to automate it using GitHub Actions to follow the theme of this repository.

Yeah I totally agree. I will probably create a new Github Action yaml for this one.

Also, the issue is updated according to our discussion 👍🏼

sayakpaul commented 2 years ago

Alright. Thank you.

deep-diver commented 2 years ago

Steps to build and run tf serving docker image

  1. Untar tf model

    $ wget https://github.com/sayakpaul/ml-deployment-k8s-fastapi/releases/download/v1.0.0/resnet50_w_preprocessing_tf.tar.gz
    $ tar -xvf resnet50_w_preprocessing_tf.tar.gz
    $ MODEL_NAME=resnet
    $ mkdir -p $MODEL_NAME/1
    $ mv resnet50_w_preprocessing_tf/* $MODEL_NAME/1
  2. Run the base tf serving image

    $ docker run -d --name serving_base tensorflow/serving
  3. Copy the model into the running tf serving image

    $ docker cp $MODEL_NAME serving_base:/models/$MODEL_NAME
  4. Commit the change and build a new docker image

    $ PROJECT_ID=...
    $ NEW_IMAGE_NAME=tfs-$MODEL_NAME:latest
    $ NEW_IMAGE_TAG=gcr.io/$PROJECT_ID/$NEW_IMAGE_NAME
    $ docker commit --change "ENV MODEL_NAME $MODEL_NAME" serving_base $NEW_IMAGE_TAG
  5. Remove the base image

    $ docker kill serving_base
    $ docker rm serving_base
  6. Run the new docker image by exposing two ports(gRPC and RestAPI) if you want to run this locally

    $ docker run -p 8501:8501 -p 8500:8500 $NEW_IMAGE_TAG

or in k8s' Deployment.yaml

  containers:
  - image: ...
    ports:
    - containerPort: 8500
      name: gRPC
    - containerPort: 8501
      name: RestAPI

@sayakpaul I couldn't find a way to create a Dockerfile for this process, but it can be managed in GitHub Action.

sayakpaul commented 2 years ago

Yup, that is what I would have done too.

Run the new docker image by exposing two ports(gRPC and RestAPI)

Maybe change it to "Locally run the ..."?

deep-diver commented 2 years ago

@sayakpaul updated :)

deep-diver commented 2 years ago

tested on GKE cluster. the next step is to run a set of experiments with Locust

one minor thing

sayakpaul commented 2 years ago

Let's maybe hold off from the gRPC load test for now 'cause we already have a comprehensive set of experiments. What do you think?

deep-diver commented 2 years ago

@sayakpaul sure!

gRPC/proto is a much faster solution than RestAPI AFAIK(Ref).

If we see the performance of RestAPI setup on TF Serving is somewhat similar to what we have done with FastAPI server, I think it is worth to try out gRPC. Otherwise, I guess we don't need gRPC experiments

sayakpaul commented 2 years ago

Right. Then let's try it out. Let me also check back with my colleague regarding how they perform load tests with our gRPC microservices (at Carted) because we have a similar deployment workflow.

sayakpaul commented 2 years ago

It now has a standalone repository: https://github.com/deep-diver/ml-deployment-k8s-tfserving