Open hellofinch opened 2 years ago
@hellofinch Could you be more specific? In ray 2.0, the dashboard visualizes the CPU, disk, memory and includes the logs. What else do you expect to see or use?
Another note: Ray already supports exporting metrics in Prometheus format: https://docs.ray.io/en/latest/ray-observability/ray-metrics.html
@scottsun94 Thanks for your response! I read the link you give. I think this is not what I need. I use Ray in a cluster where computing nodes and the login node are separated. I submit a script to set up the Ray cluster and run my program. I have no access to the ports which are opened on the computing node. If there could save a log file, I can check the info after my program is done and Ray's cluster tear down. As I know, the dashboard only visualizes the usage of CPU, disk, and memory which only show the nodes' resource usage. I'm interested in each task's resource usage. In this way, I can analyze my program more vividly.
RE: "each task's resource usage". What do you refer to by "resource usage"? You mean physical CPU/disk/memory usage by each task?
cc: @rkooo567 @ericl @rickyyx on saving the metrics as log files.
yes, that is what I mean. It will help me to analyze each part of my program and show where is my program's bottleneck.
I think the currently available information from the dashboard is not sufficient to do I'm interested in each task's resource usage. In this way, I can analyze my program more vividly.
. We are planning to improve in the short term (next 3~4 months) and then we will consider to allow persistence of the dashboard state after that (you can probably achieve this when we are working on this part).
Not fixed. Keep it open for tracking
Description
Ray dump the log which can be visualized in the dashboard or something else. The system info can be recorded such as the usage of CPU, the bandwidth, and the usage of disk and memory.
Use case
Ray dumped all logs and used the logs to show how the program runs dynamically. I can analyze the performance of the distributed program and optimize the program.