Jeremy Jordan
This repository provides an example setup for monitoring an ML system deployed on Kubernetes.
Blog post: https://www.jeremyjordan.me/ml-monitoring/
Components:
FastAPI
prometheus-fastapi-instrumentator
locust
Prometheus
Grafana
Ensure you can connect to a Kubernetes cluster and have kubectl
and helm
installed.
minikube start --driver=docker --memory 4g --nodes 2
Deploy Prometheus and Grafana onto the cluster using the community Helm chart.
kubectl create namespace monitoring
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm install prometheus-stack prometheus-community/kube-prometheus-stack -n monitoring
Verify the resources were deployed successfully.
kubectl get all -n monitoring
Connect to the Grafana dashboard.
kubectl port-forward svc/prometheus-stack-grafana 8000:80 -n monitoring
values.yaml
file)Import the model dashboard.
dashboards/model.json
in the text area.This repository includes an example REST service which exposes an ML model trained on the UCI Wine Quality dataset.
You can launch the service on Kubernetes by running:
kubectl apply -f kubernetes/models/
You can also build and run the Docker container locally.
docker build -t wine-quality-model -f model/Dockerfile model/
docker run -d -p 3000:80 -e ENABLE_METRICS=true wine-quality-model
Note: In order for Prometheus to scrape metrics from this service, we need to define a
ServiceMonitor
resource. This resource must have the labelrelease: prometheus-stack
in order to be discovered. This is configured in thePrometheus
resource spec via theserviceMonitorSelector
attribute.
You can verify the label required by running:
kubectl get prometheuses.monitoring.coreos.com prometheus-stack-kube-prom-prometheus -n monitoring -o yaml
We can simulate production traffic using a Python load testing tool called locust
. This will make HTTP requests to our model server and provide us with data to view in the monitoring dashboard.
You can begin the load test by running:
kubectl apply -f kubernetes/load_tests/
By default, production traffic will be simulated for a duration of 5 minutes. This can be changed by updating the image arguments in the kubernetes/load_tests/locust_master.yaml
manifest.
You can also modify the community Helm chart instead of using the manifests defined in this repo.
This process can eventually be automated with a Github action, but remains manual for now.
echo "INSERT_TOKEN_HERE" >> ~/.github/cr_token
cat ~/.github/cr_token | docker login ghcr.io -u jeremyjordan --password-stdin
MODEL_TAG=0.3
docker build -t wine-quality-model:$MODEL_TAG -f model/Dockerfile model/
docker tag wine-quality-model:$MODEL_TAG ghcr.io/jeremyjordan/wine-quality-model:$MODEL_TAG
LOAD_TAG=0.2
docker build -t locust-load-test:$LOAD_TAG -f load_test/Dockerfile load_test/
docker tag locust-load-test:$LOAD_TAG ghcr.io/jeremyjordan/locust-load-test:$LOAD_TAG
docker push ghcr.io/jeremyjordan/wine-quality-model:$MODEL_TAG
docker push ghcr.io/jeremyjordan/locust-load-test:$LOAD_TAG
To stop the model REST server, run:
kubectl delete -f kubernetes/models/
To stop the load tests, run:
kubectl delete -f kubernetes/load_tests/
To remove the Prometheus stack, run:
helm uninstall prometheus-stack -n monitoring