[dashboard] Wonky GPU display

DmitriGekhtman commented 3 years ago

What is the problem?

Two bugs -

(1) The GPU field for a n-gpu node looks like this -- '[0]: N/A [1]: N/A [2]: N/A ... [n-1]: N/A' which isn't too informative. Hovering mouse over each index shows a tooltip with the type of the GPU.

(2) If you launch a multi-GPU head (e.g. g4dn.12xlarge) and a single-GPU worker (e.g. p2.xlarge), the info rows for the head and worker may swap with each other every few seconds, which makes it hard to read the dashboard.
I saw this when launching on AWS and K8s a few hours ago. The very last time I tried this a few minutes ago, this bug didn't appear.

Ray version and other system information (Python version, TensorFlow version, OS): cluster launcher 2.0.0dev, rayproject/ray:nightly-gpu docker image

Reproduction (REQUIRED)

Please provide a short code snippet (less than 50 lines if possible) that can be copy-pasted to reproduce the issue. The snippet should have no external library dependencies (i.e., use fake or mock data / environments):

If the code snippet cannot be run by itself, the issue will be closed with "needs-repro-script".

[ ] I have verified my script runs in a clean environment and reproduces the issue.
[x] I have verified the issue also occurs with the latest wheels.

scottsun94 commented 1 year ago

(1) Not sure if this still happens in the new dashboard (2) This will be fixed by better default sorting which prevents nodes from shifting a lot in the node table. cc: @alanwguo

DmitriGekhtman commented 1 year ago

I'm guessing that this might have been resolved, @alanwguo can confirm.

scottsun94 commented 1 year ago

Took a quick look and it seems that:

The css style of the gpu stats is off, still using the one from the old dashboard.
The tooltip could be improved to show more info.

rkooo567 commented 1 year ago

cc @alanwguo @scottsun94 are we going to fix this as a part of frontend revamp?

scottsun94 commented 1 year ago

We could keep it as a p1 or p2 and fix it when we polish each page after Chao is on board.

scottsun94 commented 1 year ago

If we could show per-process gpu usage and gram usage, that will be great!

scottsun94 commented 1 year ago

A potential use case for having this: https://github.com/ray-project/ray/issues/31998

rkooo567 commented 1 year ago

Let's bump up the priority. Flexible GPU usage is the main use case of Ray, so we should have as great observability as possible

sip-aravind-g commented 8 months ago

still this issue persisted ? I'm guessing this issue perhaps resolved ? can you confirm me once ?

ray-project / ray

[dashboard] Wonky GPU display #14664

What is the problem?

Reproduction (REQUIRED)