This project provides a multiple-stream, real-time inference pipeline based on cloud native design pattern as following architecture diagram:
Cloud-native technologies can be applied to Artificial Intelligence (AI) for scalable application in dynamic environments such as public, private and hybrid cloud. But it requires a cloud native design to decompose monolithic inference pipeline into several microservices:
Microservice | Role | Description |
---|---|---|
Transcoding Gateway | Data Source | Receive multiple streams and perform transcoding |
Frame Queue | Data Integration | Assign the input stream into specific work queue |
Infer Engine | Data Analytics | Infer the frame and send result to result broker |
Dashboard | Data Visualization | Render the result into client's single page application |
It is extended for the following uses:
End-to-End Macro Bench Framework
for cloud native pipeline like DeathStar BenchTrusted AI pipeline
to protect input stream or model in TEE VM/ContainerSustainable AI computing
to reduce carbon footprint for AI workloadsFor the details of how to use Trust Execution Environment (TEE) to enhance the security of AI model, please refer How to Protect AI Models in Cloud-Native Environments
The provided build script simplifies the process of building Docker images for our microservices. For instance, to build all Docker images, use the following command:
./tools/docker_image_manager.sh -a build -r <your-registry> -g <your-tag>
The -a
argument specifies the action(either build
, publish
, save
or all
), and -r
is the prefix string for your docker registry, -g
argument specifies the container image tag.
You can get more detail options and arguments for docker_image_manager.sh
via ./tools/docker_image_manager.sh -h
The Dockerfile is under the directories in container
Before you deploy the helm chart, you need to setup the kubernetes cluster and install the helm tool. Please refer to the kubernetes documentation and helm documentation for more details. We also deliver a quick start script to setup the kubernetes cluster:
# This is a quick start script for ubuntu
bash ./tools/prerequisites/k8s-setup.sh
We deliver the helm chart for deployment. After you finish building the images and upload to your registry, you need to update the helm chart values image.repository
to your registry and image.tag
to your build tag, which defined in each helm chart.
You need a simple http model server for deployment, the following is an example about how to configure for the regular AI pipeline, for the Trusted AI pipeline, please refer to How to Protect AI Models in Cloud-Native Environments.
Configure the env of inference service:
- name: INFER_MODEL_INFO_URL
value: "http://{model_server_url}/tensorflow/"
- name: INFER_MODEL_ID
value: "c8b019e0-f4d8-4831-8936-f7f64ad99509"
The HTTP GET request to http://{model_server_url}/tensorflow/c8b019e0-f4d8-4831-8936-f7f64ad99509
should response:
{
"id": "c8b019e0-f4d8-4831-8936-f7f64ad99509",
"framework": "tensorflow",
"target": "object-detection",
"url": "http://{model_server_url}/tensorflow/ssdmobilenet_v10.pb",
"name": "ssdmobilenet",
"version": "1.0",
"dtype": "int8",
"encrypted": false,
}
http://{model_server_url}/tensorflow/ssdmobilenet_v10.pb
is where the model stored.
For more details about Model Server, please refer to AI Model Server.
Note: ServiceMonitor
CR is included in helm charts, you can install kube-prometheus to install the CRD.
Then, assume you navigate to the project's root directory, you can use the following options to install the helm chart:
Navigate to the project's root directory.
Execute the Helm manager script with the appropriate arguments. For instance, to install all Helm charts, use the following command:
./tools/helm_manager.sh -i -n <your-namespace>
# Specfiy the image registry and tag via `-r` and `-g` arguments
# ./tools/helm_manager.sh -i -n <your-namespace> -r <your-registry> -g <your-tag>
# To uninstall all charts
# ./tools/helm_manager.sh -u -n <your-namespace>
The -i
argument triggers the installation of Helm charts, -u
argument triggers the uninstallation of Helm charts, -n
argument specifies the namespace, -r
argument specifies the image registry, -g
argument specifies the image tag.
You can also specify a specific chart to install or uninstall using the chart name as an argument. For instance:
./tools/helm_manager.sh -i <chart_name> -n <your-namespace>
./tools/helm_manager.sh -u <chart_name> -n <your-namespace>
Use -l
to list all available charts and -h
to display help information.
Please refer to the script source code for more detailed information on how they work and the full range of available options.
Navigate to the project's root directory.
Execute the Helm manager script with the appropriate arguments. For instance, to install all Helm charts, use the following command:
# helm install <customer-release-name> <helm-chart-directory> --namespace=<your-namespace>
# Redis service
helm install redis ./helm/redis --namespace=<your-namespace>
# Optional, if you want to see the redis dashboard in grafana: helm install redis-exporter ./helm/redis-exporter --namespace=<your-namespace>
# Inference service
helm install inference ./helm/inference --namespace=<your-namespace>
# SPA service
helm install pipelineapi ./helm/pipelineapi --namespace=<your-namespace>
helm install websocket ./helm/websocket --namespace=<your-namespace>
helm install ui ./helm/ui --namespace=<your-namespace>
# Steam service
helm install stream ./helm/stream --namespace=<your-namespace>
The dashboard of CNAP will be available at http://<your-ip>:31002
, it is exposed as a NodePort service in kubernetes.
Note: This is pre-release/prototype software and, as such, it may be substantially modified as updated versions are made available.
The Cloud Native AI Pipeline incorporates several key technologies to foster a robust, scalable, and insightful environment conducive for cloud-native deployments. Our integration encompasses monitoring, visualization, and event-driven autoscaling to ensure optimized performance and efficient resource utilization.
Our project is instrumented to expose essential metrics to Prometheus, a reliable monitoring solution that aggregates and stores metric data. This metric exposition forms the basis for informed autoscaling decisions, ensuring our system dynamically adapts to workload demands.
Note that, when you want to deploy the workloads into other namespace, please first patch the Prometheus RoleBinding to grant the permission to access the workloads in other namespace:
kubectl apply -f ./k8s-manifests/prometheus/ClusterRole-All.yaml
Grafana is employed to provide visual insights into the system's performance and the efficacy of the autoscaling integration. Through intuitive dashboards, we can monitor and analyze the metrics collected by Prometheus, fostering a transparent and insightful monitoring framework.
The dashboards of this project is available at ./k8s-manifests/grafana/dashboards
, you can import it into your grafana.
Kubernetes Event-driven Autoscaling (KEDA) is integrated as an operator to orchestrate the dynamic scaling of our Horizontal Pod Autoscaler (HPA) and Vertical Pod Autoscaler (VPA) based on the metrics collected by Prometheus. This synergy ensures that resources are efficiently allocated in real-time, aligning with the fluctuating workload demands, thus embodying the essence of cloud-native scalability.
As our project evolves, we envisage the integration of additional technologies to further enhance the cloud-native capabilities of our AI pipeline. For a deeper dive into the current integration and instructions on configuration and usage, refer to the Integration Documentation.
To integrate KEDA with Prometheus, you need to deploy the Service Monitor CR for KEDA:
kubectl apply -f ./k8s-manifests/keda/keda-service-monitor.yaml
And an example of KEDA ScaledObject is available at ./k8s-manifests/keda/infer_scale.yaml
, you can deploy it to your kubernetes cluster to scale the workloads.
In our endeavor to not only optimize the performance but also minimize the environmental impact of our Cloud Native AI Pipeline, we have integrated Kepler, a Kubernetes-based Efficient Power Level Exporter. Kepler employs eBPF to probe system statistics and utilizes machine learning models to estimate the energy consumption of workloads based on these statistics. The energy consumption metrics are then exported to Prometheus, enriching our monitoring framework with vital data that reflects the energy efficiency of our deployments.
This integration aligns with our sustainability objectives by providing a clear insight into the energy footprint of our workloads. By understanding and analyzing the energy metrics provided by Kepler, we can make informed decisions to optimize the energy efficiency of our pipeline, thus contributing to a more sustainable and eco-friendly cloud-native environment.
Furthermore, the integration of Kepler augments our existing monitoring setup with Prometheus and visualization through Grafana, by extending the metrics collection to include energy consumption metrics. This not only enhances our monitoring and visualization framework but also fosters a more holistic understanding of our system's performance and its environmental impact.
For more details on configuring and utilizing Kepler for energy efficiency monitoring, refer to the Kepler Documentation.
Le Yao |
Longyin Hu |
Lu Ken |
Xiaocheng Dong |
Yanbo Xu |
Wang, Hongbo |
Null |
Jialei Feng |
Robert Dower |