Open gilljon opened 4 months ago
We also have similar ideas. I'd like to know, in addition to the actual GRAM usage at the actor/task level, do you think adding a column to describe the logical resource usage of the GPU or XPU would be helpful? Something like that could help better understand why the GRAM may have a certain amount of idle capacity, or assist in identifying cases where there is a significant deviation between the logical resource allocation and the actual usage. Looking forward to hearing your thoughts. @gilljon
Additionally, the GRAM info at actor/task level is actually available, but there's a small bug that's preventing it from being displayed. We will help fix it.
We also have similar ideas. I'd like to know, in addition to the actual GRAM usage at the actor/task level, do you think adding a column to describe the logical resource usage of the GPU or XPU would be helpful?
I do think describing the logical resource is useful. We saturate our GPUs, so I always have about 90% utilization but am lost what is actually consuming the resources.
Description
It would be very useful to have some insight from the Ray Dashboard regarding the actual GRAM a given Ray Actor/Task is consuming.
Use case
When allocating GPU resources (since resources are fractional), it would be beneficial to see how much GPU memory a given Ray actor/task actual consumes. Then, based off this, you can make a better informed allocation decision.