ray-project / ray

Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
https://ray.io
Apache License 2.0
33.09k stars 5.6k forks source link

[Dashboard] Kill a job #30182

Open GuillaumeDesforges opened 1 year ago

GuillaumeDesforges commented 1 year ago

Description

To my knowledge, there is no way to interupt a job, neither from the dashboard REST API nor the dashboard UI.

It would be helpful to have

Use case

A long-running job has been updated and needs to be restarted, so I stop the running job and re-submit with newer code/config.

scottsun94 commented 1 year ago

@GuillaumeDesforges How do you run those jobs? Do you use job API to submit those jobs to the cluster or use other means (run a script on a node directly or via ray client)?

GuillaumeDesforges commented 1 year ago

I use JobSubmissionClient in a script from my local machine to submit to a remote ray cluster.

scottsun94 commented 1 year ago

Got it. Thanks! One question:

In terms of "a button in the "Jobs" tab in the dashboard UI to kill a job", we've heard similar requests before. I've added it into our backlog.

GuillaumeDesforges commented 1 year ago

Thanks, indeed I missed on .stop_job.

However a stop job endpoint exposed via the web API could be helpful for interoperability with other tools (e.g. command that pipes to xargs curl).

scottsun94 commented 1 year ago

cc: @alanwguo @rkooo567 @rickyyx @architkulkarni for awareness

stale[bot] commented 1 year ago

Hi, I'm a bot from the Ray team :)

To help human contributors to focus on more relevant issues, I will automatically add the stale label to issues that have had no activity for more than 4 months.

If there is no further activity in the 14 days, the issue will be closed!

You can always ask for help on our discussion forum or Ray's public slack channel.

scottsun94 commented 1 year ago

Keep it open. It's still in the backlog.

yc2984 commented 1 year ago

It would be really helpful to have this feature!

gtarcoder commented 9 months ago

Is there a timeline to deliver this feature?

scottsun94 commented 9 months ago

No timeline yet. The team is pretty overloaded at the moment. Contribution is welcome.

yogeshg commented 2 months ago

Any timelines on this yet?

yogeshg commented 2 months ago

Similarly, is it expected that the UI should have a STOP button for individual tasks as well?

Superskyyy commented 2 months ago

We plan to first add the job stop to dashboard.