Closed deep-diver closed 2 years ago
@deep-diver I think we should create a separate notebook for TF-serving.
Deploy the built docker image on GKE cluster
Would be great to automate it using GitHub Actions to follow the theme of this repository.
Check the deployed model's performance with a various scenarios (maybe the same ones applied to ONNX+FastAPI scenarios)
100% agreed.
@sayakpaul
I think we should create a separate notebook for TF-serving.
You think so? Let me create a new notebook, and let's see what's better after that then :)
Would be great to automate it using GitHub Actions to follow the theme of this repository.
Yeah I totally agree. I will probably create a new Github Action yaml for this one.
Also, the issue is updated according to our discussion 👍🏼
Alright. Thank you.
Steps to build and run tf serving docker image
Untar tf model
$ wget https://github.com/sayakpaul/ml-deployment-k8s-fastapi/releases/download/v1.0.0/resnet50_w_preprocessing_tf.tar.gz
$ tar -xvf resnet50_w_preprocessing_tf.tar.gz
$ MODEL_NAME=resnet
$ mkdir -p $MODEL_NAME/1
$ mv resnet50_w_preprocessing_tf/* $MODEL_NAME/1
Run the base tf serving image
$ docker run -d --name serving_base tensorflow/serving
Copy the model into the running tf serving image
$ docker cp $MODEL_NAME serving_base:/models/$MODEL_NAME
Commit the change and build a new docker image
$ PROJECT_ID=...
$ NEW_IMAGE_NAME=tfs-$MODEL_NAME:latest
$ NEW_IMAGE_TAG=gcr.io/$PROJECT_ID/$NEW_IMAGE_NAME
$ docker commit --change "ENV MODEL_NAME $MODEL_NAME" serving_base $NEW_IMAGE_TAG
Remove the base image
$ docker kill serving_base
$ docker rm serving_base
Run the new docker image by exposing two ports(gRPC and RestAPI) if you want to run this locally
$ docker run -p 8501:8501 -p 8500:8500 $NEW_IMAGE_TAG
or in k8s' Deployment.yaml
containers:
- image: ...
ports:
- containerPort: 8500
name: gRPC
- containerPort: 8501
name: RestAPI
@sayakpaul I couldn't find a way to create a Dockerfile for this process, but it can be managed in GitHub Action.
Yup, that is what I would have done too.
Run the new docker image by exposing two ports(gRPC and RestAPI)
Maybe change it to "Locally run the ..."?
@sayakpaul updated :)
tested on GKE cluster. the next step is to run a set of experiments with Locust
one minor thing
Ingress
since gRPC is based on HTTPs connection, and LoadBalancer
doesn't support HTTPs as far as I knowLet's maybe hold off from the gRPC load test for now 'cause we already have a comprehensive set of experiments. What do you think?
@sayakpaul sure!
gRPC/proto is a much faster solution than RestAPI AFAIK(Ref).
If we see the performance of RestAPI setup on TF Serving is somewhat similar to what we have done with FastAPI server, I think it is worth to try out gRPC. Otherwise, I guess we don't need gRPC experiments
Right. Then let's try it out. Let me also check back with my colleague regarding how they perform load tests with our gRPC microservices (at Carted) because we have a similar deployment workflow.
It now has a standalone repository: https://github.com/deep-diver/ml-deployment-k8s-tfserving
In this new feature, the following works are expected
Update the notebookCreate a new notebook with the TF Serving prototype based on both gRPC(Ref) and RestAPI(Ref).Update the notebookUpdate the newly created notebook to check the%%timeit
on the TF Serving server locally.Build/Commit docker image based on TF Serving base image using this method.
Deploy the built docker image on GKE cluster
Check the deployed model's performance with a various scenarios (maybe the same ones applied to ONNX+FastAPI scenarios)