Open DmitriGekhtman opened 3 years ago
(1) Not sure if this still happens in the new dashboard (2) This will be fixed by better default sorting which prevents nodes from shifting a lot in the node table. cc: @alanwguo
I'm guessing that this might have been resolved, @alanwguo can confirm.
Took a quick look and it seems that:
cc @alanwguo @scottsun94 are we going to fix this as a part of frontend revamp?
We could keep it as a p1 or p2 and fix it when we polish each page after Chao is on board.
If we could show per-process gpu usage and gram usage, that will be great!
A potential use case for having this: https://github.com/ray-project/ray/issues/31998
Let's bump up the priority. Flexible GPU usage is the main use case of Ray, so we should have as great observability as possible
still this issue persisted ? I'm guessing this issue perhaps resolved ? can you confirm me once ?
What is the problem?
Two bugs -
(1) The GPU field for a n-gpu node looks like this -- '[0]: N/A [1]: N/A [2]: N/A ... [n-1]: N/A' which isn't too informative. Hovering mouse over each index shows a tooltip with the type of the GPU.
(2) If you launch a multi-GPU head (e.g. g4dn.12xlarge) and a single-GPU worker (e.g. p2.xlarge), the info rows for the head and worker may swap with each other every few seconds, which makes it hard to read the dashboard.
I saw this when launching on AWS and K8s a few hours ago. The very last time I tried this a few minutes ago, this bug didn't appear.
Ray version and other system information (Python version, TensorFlow version, OS): cluster launcher 2.0.0dev, rayproject/ray:nightly-gpu docker image
Reproduction (REQUIRED)
Please provide a short code snippet (less than 50 lines if possible) that can be copy-pasted to reproduce the issue. The snippet should have no external library dependencies (i.e., use fake or mock data / environments):
If the code snippet cannot be run by itself, the issue will be closed with "needs-repro-script".