Health metric API to expose Redis connection pool metrics

lablup / backend.ai

Backend.AI is a streamlined, container-based computing cluster platform that hosts popular computing/ML frameworks and diverse programming languages, with pluggable heterogeneous accelerator support including CUDA GPU, ROCm GPU, TPU, IPU and other NPUs.

GNU Lesser General Public License v3.0

502 stars 150 forks source link

Let's add a Prometheus-compatible API endpoint to expose the health metrics of Redis connection pools used by each manager process.

https://github.com/prometheus/docs/blob/main/content/docs/instrumenting/exposition_formats.md

The API handler itself should be very simple to implement. Since we have a multi-node multi-process architecture for Manager, we should use an external storage (Redis) to aggregate the metrics from different manager processes, and adopt a separate Redis connection mechanism like #2041 to avoid interference with the monitored connection pool.

The metric may be composed of:

per-process connection pool information
- per-pool metrics for different redis db by usage
- e.g., the pool size, the occupancy
redis-side metrics
- e.g., number of total client connections

lablup / backend.ai

Health metric API to expose Redis connection pool metrics #2522