By: Sayak Paul and Chansung Park
This project shows how to serve an ONNX-optimized image classification model as a RESTful web service with FastAPI, Docker, and Kubernetes (k8s). The idea is to first Dockerize the API and then deploy it on a k8s cluster running on Google Kubernetes Engine (GKE). We do this integration using GitHub Actions.
👋 Note: Even though this project uses an image classification its structure and techniques can be used to serve other models as well. We also worked on a TF Serving equivalent of this project. Check it out here.
Update July 19 2022: This project won the #TFCommunitySpotlight award.
notebooks/TF_to_ONNX.ipynb
notebook.api
directory.To deploy the API, we define our deployment.yaml
workflow file inside .github/workflows
.
It does the following tasks:
GCP_CREDENTIALS
on your GitHub repository and copy paste the
contents of the service account key file into the secret. Configure bucket storage related permissions for the service account:
$ export PROJECT_ID=<PROJECT_ID>
$ export ACCOUNT=<ACCOUNT>
$ gcloud -q projects add-iam-policy-binding ${PROJECT_ID} \
--member=serviceAccount:${ACCOUNT}@${PROJECT_ID}.iam.gserviceaccount.com \
--role roles/storage.admin
$ gcloud -q projects add-iam-policy-binding ${PROJECT_ID} \
--member=serviceAccount:${ACCOUNT}@${PROJECT_ID}.iam.gserviceaccount.com \
--role roles/storage.objectAdmin
gcloud -q projects add-iam-policy-binding ${PROJECT_ID} \
--member=serviceAccount:${ACCOUNT}@${PROJECT_ID}.iam.gserviceaccount.com \
--role roles/storage.objectCreator
main
branch already then upon a new push, the worflow defined
in .github/workflows/deployment.yaml
should automatically run. Here's how the
final outputs should look like (run link):We conducted load-testing varying the number of workers, RAM, nodes, etc. From that experiment, we found out that for our setup, 8 nodes each having 2 vCPUs and 4 GBs of work the best in terms of throughput and latency. The figure below summarizes our results:
You can find the load-testing details under locust
directory.
From workflow outputs, you should see something like so:
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
fastapi-server LoadBalancer xxxxxxxxxx xxxxxxxxxx 80:30768/TCP 23m
kubernetes ClusterIP xxxxxxxxxx <none> 443/TCP 160m
Note the EXTERNAL-IP
corresponding to fastapi-server
(iff you have named
your service like so). Then cURL it:
curl -X POST -F image_file=@cat.jpg -F with_resize=True -F with_post_process=True http://{EXTERNAL-IP}:80/predict/image
You should get the following output (if you're using the cat.jpg
image present
in the api
directory):
"{\"Label\": \"tabby\", \"Score\": \"0.538\"}"
The request assumes that you have a file called cat.jpg
present in your
working directory.
Note that if you don't see any external IP address from your GitHub Actions console log, then after successful deployment, do the following:
# Authenticate to your GKE cluster.
$ gcloud container clusters get-credentials ${GKE_CLUSTER} --zone {GKE_ZONE} --project {GCP_PROJECT_ID}
$ kubectl get services -o wide
From there, note the external IP.