Open clarng opened 1 year ago
@clarng - what do you feel is the right priority for this enhancement?
I did this as part of my ray doctor hackathon - feel free to let me know if you want some pointers. I did it with the prometheus client to query metrics.
What happened + What you expected to happen
Add telemetry for cluster utilization (cpu, memory)
Possibly gpu as well
Versions / Dependencies
master
Reproduction script
n/a
Issue Severity
None