Cloud Native AI Pipeline

1. Overview

This project provides a multiple-stream, real-time inference pipeline based on cloud native design pattern as following architecture diagram:

Cloud-native technologies can be applied to Artificial Intelligence (AI) for scalable application in dynamic environments such as public, private and hybrid cloud. But it requires a cloud native design to decompose monolithic inference pipeline into several microservices:

Microservice	Role	Description
Transcoding Gateway	Data Source	Receive multiple streams and perform transcoding
Frame Queue	Data Integration	Assign the input stream into specific work queue
Infer Engine	Data Analytics	Infer the frame and send result to result broker
Dashboard	Data Visualization	Render the result into client's single page application

2. Uses

It is extended for the following uses:

End-to-End Macro Bench Framework for cloud native pipeline like DeathStar Bench
Trusted AI pipeline to protect input stream or model in TEE VM/Container
Sustainable AI computing to reduce carbon footprint for AI workloads

For the details of how to use Trust Execution Environment (TEE) to enhance the security of AI model, please refer How to Protect AI Models in Cloud-Native Environments

3. Building

The provided build script simplifies the process of building Docker images for our microservices. For instance, to build all Docker images, use the following command:

./tools/docker_image_manager.sh -a build -r <your-registry> -g <your-tag>

The -a argument specifies the action(either build, publish, save or all), and -r is the prefix string for your docker registry, -g argument specifies the container image tag.

You can get more detail options and arguments for docker_image_manager.sh via ./tools/docker_image_manager.sh -h

The Dockerfile is under the directories in container

4. Deployment

Before you deploy the helm chart, you need to setup the kubernetes cluster and install the helm tool. Please refer to the kubernetes documentation and helm documentation for more details. We also deliver a quick start script to setup the kubernetes cluster:

# This is a quick start script for ubuntu
bash ./tools/prerequisites/k8s-setup.sh

We deliver the helm chart for deployment. After you finish building the images and upload to your registry, you need to update the helm chart values image.repository to your registry and image.tag to your build tag, which defined in each helm chart.

You need a simple http model server for deployment, the following is an example about how to configure for the regular AI pipeline, for the Trusted AI pipeline, please refer to How to Protect AI Models in Cloud-Native Environments.

Configure the env of inference service:

  - name: INFER_MODEL_INFO_URL
    value: "http://{model_server_url}/tensorflow/"
  - name: INFER_MODEL_ID
    value: "c8b019e0-f4d8-4831-8936-f7f64ad99509"

The HTTP GET request to http://{model_server_url}/tensorflow/c8b019e0-f4d8-4831-8936-f7f64ad99509 should response:

{
    "id": "c8b019e0-f4d8-4831-8936-f7f64ad99509",
    "framework": "tensorflow",
    "target": "object-detection",
    "url": "http://{model_server_url}/tensorflow/ssdmobilenet_v10.pb",
    "name": "ssdmobilenet",
    "version": "1.0",
    "dtype": "int8",
    "encrypted": false,
}

http://{model_server_url}/tensorflow/ssdmobilenet_v10.pb is where the model stored. For more details about Model Server, please refer to AI Model Server.

Note: ServiceMonitor CR is included in helm charts, you can install kube-prometheus to install the CRD.

Then, assume you navigate to the project's root directory, you can use the following options to install the helm chart:

Deploy with the helm manager

Navigate to the project's root directory.
Execute the Helm manager script with the appropriate arguments. For instance, to install all Helm charts, use the following command:
```
./tools/helm_manager.sh -i -n <your-namespace>
# Specfiy the image registry and tag via `-r` and `-g` arguments
# ./tools/helm_manager.sh -i -n <your-namespace> -r <your-registry> -g <your-tag>
# To uninstall all charts
# ./tools/helm_manager.sh -u -n <your-namespace>
```
The -i argument triggers the installation of Helm charts, -u argument triggers the uninstallation of Helm charts, -n argument specifies the namespace, -r argument specifies the image registry, -g argument specifies the image tag.

You can also specify a specific chart to install or uninstall using the chart name as an argument. For instance:
```
./tools/helm_manager.sh -i <chart_name> -n <your-namespace>
./tools/helm_manager.sh -u <chart_name> -n <your-namespace>
```
Use -l to list all available charts and -h to display help information.

Please refer to the script source code for more detailed information on how they work and the full range of available options.

Deploy with the helm command

Navigate to the project's root directory.

Execute the Helm manager script with the appropriate arguments. For instance, to install all Helm charts, use the following command:

# helm install <customer-release-name> <helm-chart-directory> --namespace=<your-namespace>

# Redis service
helm install redis ./helm/redis --namespace=<your-namespace>
# Optional, if you want to see the redis dashboard in grafana: helm install redis-exporter ./helm/redis-exporter --namespace=<your-namespace>

# Inference service
helm install inference ./helm/inference --namespace=<your-namespace>

# SPA service
helm install pipelineapi ./helm/pipelineapi --namespace=<your-namespace>
helm install websocket ./helm/websocket --namespace=<your-namespace>
helm install ui ./helm/ui --namespace=<your-namespace>

# Steam service
helm install stream ./helm/stream --namespace=<your-namespace>

The dashboard of CNAP will be available at http://<your-ip>:31002, it is exposed as a NodePort service in kubernetes.

Note: This is pre-release/prototype software and, as such, it may be substantially modified as updated versions are made available.

5. Integration

The Cloud Native AI Pipeline incorporates several key technologies to foster a robust, scalable, and insightful environment conducive for cloud-native deployments. Our integration encompasses monitoring, visualization, and event-driven autoscaling to ensure optimized performance and efficient resource utilization.

Monitoring with Prometheus

Our project is instrumented to expose essential metrics to Prometheus, a reliable monitoring solution that aggregates and stores metric data. This metric exposition forms the basis for informed autoscaling decisions, ensuring our system dynamically adapts to workload demands.

Note that, when you want to deploy the workloads into other namespace, please first patch the Prometheus RoleBinding to grant the permission to access the workloads in other namespace:

kubectl apply -f ./k8s-manifests/prometheus/ClusterRole-All.yaml

Visualization with Grafana

Grafana is employed to provide visual insights into the system's performance and the efficacy of the autoscaling integration. Through intuitive dashboards, we can monitor and analyze the metrics collected by Prometheus, fostering a transparent and insightful monitoring framework.

The dashboards of this project is available at ./k8s-manifests/grafana/dashboards, you can import it into your grafana.

Event-Driven Autoscaling with KEDA

Kubernetes Event-driven Autoscaling (KEDA) is integrated as an operator to orchestrate the dynamic scaling of our Horizontal Pod Autoscaler (HPA) and Vertical Pod Autoscaler (VPA) based on the metrics collected by Prometheus. This synergy ensures that resources are efficiently allocated in real-time, aligning with the fluctuating workload demands, thus embodying the essence of cloud-native scalability.

As our project evolves, we envisage the integration of additional technologies to further enhance the cloud-native capabilities of our AI pipeline. For a deeper dive into the current integration and instructions on configuration and usage, refer to the Integration Documentation.

To integrate KEDA with Prometheus, you need to deploy the Service Monitor CR for KEDA:

kubectl apply -f ./k8s-manifests/keda/keda-service-monitor.yaml

And an example of KEDA ScaledObject is available at ./k8s-manifests/keda/infer_scale.yaml, you can deploy it to your kubernetes cluster to scale the workloads.

Sustainability with Kepler

In our endeavor to not only optimize the performance but also minimize the environmental impact of our Cloud Native AI Pipeline, we have integrated Kepler, a Kubernetes-based Efficient Power Level Exporter. Kepler employs eBPF to probe system statistics and utilizes machine learning models to estimate the energy consumption of workloads based on these statistics. The energy consumption metrics are then exported to Prometheus, enriching our monitoring framework with vital data that reflects the energy efficiency of our deployments.

This integration aligns with our sustainability objectives by providing a clear insight into the energy footprint of our workloads. By understanding and analyzing the energy metrics provided by Kepler, we can make informed decisions to optimize the energy efficiency of our pipeline, thus contributing to a more sustainable and eco-friendly cloud-native environment.

Furthermore, the integration of Kepler augments our existing monitoring setup with Prometheus and visualization through Grafana, by extending the metrics collection to include energy consumption metrics. This not only enhances our monitoring and visualization framework but also fosters a more holistic understanding of our system's performance and its environmental impact.

For more details on configuring and utilizing Kepler for energy efficiency monitoring, refer to the Kepler Documentation.