Open richardliaw opened 1 year ago
The logging part is done by https://github.com/ray-project/ray/pull/37273/
The job exit code is logged and shown in job driver logs I guess? If so, users are able to see them in the driver logs in ray dashboard automatically.
Can we tell customer this is resolved ? Also, I am referring original slack thread here.
For the purposes of that thread it's resolved (make the exit code appear somewhere in the logs). It's resolved in the Ray nightly and in Ray 2.7.
But we'll leave this issue open to track the enhancement which is to show it in the dashboard.
Thanks @architkulkarni . Do you which release is planned for enhancement (Ray dashboard display of error code)
Not sure about the planning for the dashboard part, perhaps @alanwguo knows.
@alanwguo - Following up on this
Actually, we already allow users to view the job message if it failed. The return code is logged both in logs and message according to https://github.com/ray-project/ray/pull/37273. I think dashboard part is already there and we can close it. We don't have to separately show the status code. cc: @architkulkarni to confirm.
I see, I think that's fine as a minimal way to get the exit code. A few thoughts:
@sudhirn-anyscale is the status quo enough for the users you're dealing with?
@architkulkarni - Ideally customer would like to make SDK call on a job and see a return code in one of the status fields. IT does not have to be displayed on dashboard.
They would like to avoid searching logs for a error code because return code in logs could match to anything.
@sudhirn-anyscale I see, that will be added by https://github.com/ray-project/ray/pull/39675 which will be in Ray 2.8. The exit code will appear in the JobInfo
field returned by the CLI ray job info
and the SDK get_job_info
Thanks @architkulkarni . That answers what I was looking for.
It looks a little weird that it's not labeled "Exit code: 42". If it's just a plain number, it might be confused for the last line of the user script's output (could be bad if they print out a list of numbers and intend to use the last number as their calculation result).
This is not how it looks like now. I just want to show that we have a way to show the message
field of the job. And the exit code will be logged there.
Ideally it would appear in the GUI somewhere near Status: FAILED.
I think that the message button is good enough for now. We can add it if needed in the future. We can keep it open to track this.
Looks like the return code of a job isn't recorded in Ray. Could we log this or show it in dashboard? https://github.com/ray-project/ray/blob/5470671c5e5e14ed4afbb52ac4118accc1789cfd/dashboard/modules/job/job_manager.py#L449-L466
This would be better reflected on the Ray Dashboard to help users understand errors.
will leave it to @alanwguo to triage this.