ray-project / kuberay

A toolkit to run Ray applications on Kubernetes
Apache License 2.0
1.26k stars 408 forks source link

[Feature] Multiple RayCluster CRs share the same Grafana #2502

Open kevin85421 opened 1 week ago

kevin85421 commented 1 week ago

Search before asking

Description

copy from https://ray.slack.com/archives/C02GFQ82JPM/p1730847604068249?thread_ts=1730408240.682809&cid=C02GFQ82JPM

In my ray cluster's metrics screen, i see metrics that are specific to my cluster, even though we have many ray clusters in our k8s cluster. However, if I click "view in grafana" in the top right corner, the corresponding grafana dashboard combines metrics from all ray clusters and sums them up. I want to add a filterable variable to the grafana dashboard to be able to set it to show me metrics for one cluster at a time.

I can do this for some of the panels by setting a filter on the ray_io_cluster field. However, other metrics don't appear to have a ray_io_cluster field to filter on.

Screenshot 2024-11-05 at 6 00 48 PM Screenshot 2024-11-05 at 6 01 38 PM

Use case

No response

Related issues

No response

Are you willing to submit a PR?

daturkel commented 1 week ago

For me, the best resolution of this issue is if ray_io_cluster was available on every metric on the ray core and ray data dashboards (it seems to be only missing from a few), and if a filter for a Grafana variable was added to each panel in the default Ray Grafana dashboards so that dashboard users could quickly toggle between ray_io_cluster values. As it stands, I can manually add a filter to every single panel, but even then a few panels will be missing the filter.

kevin85421 commented 6 days ago

@daturkel does this PR solve your issue https://github.com/ray-project/kuberay/pull/2524?

daturkel commented 6 days ago

@kevin85421 thank you for the quick turnaround on that! That would allow me to manually add the ray_io_cluster to each panel! Is it possible to have the ray_io_cluster variable added as a filter by default to the panels (the way session_name is here.) Otherwise, users have to add the filter one panel at a time (and some panels have multiple metrics to filter). If not, your issue will still allow me to manually fix this issue so thank you!

kevin85421 commented 6 days ago

Is it possible to have the ray_io_cluster variable added as a filter by default to the panels (the way session_name is here.)

The requests make sense to me. I will take a look.