sayakpaul / ml-deployment-k8s-fastapi

This project shows how to serve an ONNX-optimized image classification model as a web service with FastAPI, Docker, and Kubernetes.
https://medium.com/google-developer-experts/load-testing-tensorflow-serving-and-fastapi-on-gke-411bc14d96b2
Apache License 2.0
198 stars 36 forks source link
docker fastapi google-cloud-platform kubernetes onnx rest tensorflow

Deploying ML models with FastAPI, Docker, and Kubernetes

By: Sayak Paul and Chansung Park


Figure developed by Chansung Park

This project shows how to serve an ONNX-optimized image classification model as a RESTful web service with FastAPI, Docker, and Kubernetes (k8s). The idea is to first Dockerize the API and then deploy it on a k8s cluster running on Google Kubernetes Engine (GKE). We do this integration using GitHub Actions.

👋 Note: Even though this project uses an image classification its structure and techniques can be used to serve other models as well. We also worked on a TF Serving equivalent of this project. Check it out here.

Update July 19 2022: This project won the #TFCommunitySpotlight award.

Deploying the model as a service with k8s

Configurations needed beforehand

Notes

Querying the API endpoint

From workflow outputs, you should see something like so:

NAME             TYPE           CLUSTER-IP     EXTERNAL-IP     PORT(S)        AGE
fastapi-server   LoadBalancer   xxxxxxxxxx   xxxxxxxxxx        80:30768/TCP   23m
kubernetes       ClusterIP      xxxxxxxxxx     <none>          443/TCP        160m

Note the EXTERNAL-IP corresponding to fastapi-server (iff you have named your service like so). Then cURL it:

curl -X POST -F image_file=@cat.jpg -F with_resize=True -F with_post_process=True http://{EXTERNAL-IP}:80/predict/image

You should get the following output (if you're using the cat.jpg image present in the api directory):

"{\"Label\": \"tabby\", \"Score\": \"0.538\"}"

The request assumes that you have a file called cat.jpg present in your working directory.

Note that if you don't see any external IP address from your GitHub Actions console log, then after successful deployment, do the following:

# Authenticate to your GKE cluster.
$ gcloud container clusters get-credentials ${GKE_CLUSTER} --zone {GKE_ZONE} --project {GCP_PROJECT_ID}
$ kubectl get services -o wide

From there, note the external IP.

Acknowledgements