ray-project / ray

Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
https://ray.io
Apache License 2.0
34.01k stars 5.78k forks source link

[Ray Core] A RayJob that is in a pending state should be cancelable. #36858

Open xiangp152 opened 1 year ago

xiangp152 commented 1 year ago

Description

A RayJob that is in a pending state and is preparing the runtime env should be cancellable immediately.

Use case

If you have submitted a Ray job that requires a long time to prepare the runtime env and you realize too late that it will take a long time to execute, you may need to cancel the task and find a better way to run it. If the task can be cancelled immediately, you can re-run it right away with the new approach. This can save you time and prevent unnecessary delays in your project. Therefore, it is important to have a cancellation mechanism in place that can handle such situations efficiently and without causing any issues with the runtime env.

SongGuyang commented 1 year ago

@architkulkarni @Catch-Bull any thought of this?

architkulkarni commented 1 year ago

Duplicate of https://github.com/ray-project/ray/issues/28221

architkulkarni commented 1 year ago

I think this enhancement makes sense and there's nothing controversial about the behavior. There are two parts, some of which might already be done:

We likely won't be able to prioritize adding this, but would welcome any PRs. cc @akshay-anyscale and cc @jjyao (since we discussed some related things before)