numaproj / numaflow

Kubernetes-native platform to run massively parallel data/streaming jobs
https://numaflow.numaproj.io/
Apache License 2.0
1.29k stars 115 forks source link

Numaflow Debuggability #2055

Open veds-g opened 2 months ago

veds-g commented 2 months ago

Summary

Currently, the numaflow UI offers real-time status updates at various levels, including vertex, pod, and container levels with enough context. To enhance debuggability, we should extend this capability to include more comprehensive and detailed metrics with historical data. By analyzing the metrics over time, users can detect trends, identify anomalies, and determine the root cause of issues.

Tasks

vigith commented 2 months ago

Why do we need a prometheus server in Numaflow? Can't we assume/mandate that a metrics provider that supports PromQL is provided?

veds-g commented 2 months ago

Why do we need a prometheus server in Numaflow? Can't we assume/mandate that a metrics provider that supports PromQL is provided?

We do not need this. Supported metrics provider will be a mandate. This issue is just about testing promql query by running a prometheus server. Maybe we can rename it?