ray-project / ray

Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
https://ray.io
Apache License 2.0
33.14k stars 5.61k forks source link

How can I custom resource after ray cluster start #47644

Open SeanQuant opened 1 week ago

SeanQuant commented 1 week ago

Description

I’m currently using an exiting Ray Cluster, and aim to launch a series of tasks when control the level of concurrency(for instance, only 3 tasks running simultaneously even with 10 tasks pending, cpus/memory available)). I always use custom resources with self-initialed Ray Cluster before, but I find this method unsuitable as the existing Ray cluster was not initialized by me.

Any solution?

Use case

No response

ruisearch42 commented 1 week ago

Are these normal tasks for actor tasks? Is concurrency group (if actor tasks) a possible solution for you? https://docs.ray.io/en/latest/ray-core/actors/concurrency_group_api.html

The other idea is to get cluster resource first, say you get to know CPU=20, and then you create a placement group for CPU=17 just for place holder, and then you submit tasks each with CPU=1, and it will limit to 3 tasks.

jjyao commented 1 week ago

You can create a PG with 3 CPUs and launch all your tasks inside this PG and it will limit to 3 tasks running (assuming each task takes 1 CPU).

You can also do the limiting on the application side: submit 3 tasks and use ray.wait to wait for one to finish and then submit other one: https://docs.ray.io/en/latest/ray-core/patterns/limit-pending-tasks.html

SeanQuant commented 6 days ago

Are these normal tasks for actor tasks? Is concurrency group (if actor tasks) a possible solution for you? https://docs.ray.io/en/latest/ray-core/actors/concurrency_group_api.html

The other idea is to get cluster resource first, say you get to know CPU=20, and then you create a placement group for CPU=17 just for place holder, and then you submit tasks each with CPU=1, and it will limit to 3 tasks.

https://docs.ray.io/en/latest/ray-core/actors/async_api.html#threaded-actors It seems that Threaded-Actors can solve my problem.

But it can't replace Custom Resource. I mean, what if I wanted my tasks to run in a sepecific node with concurrency-controlled.

SeanQuant commented 6 days ago

You can create a PG with 3 CPUs and launch all your tasks inside this PG and it will limit to 3 tasks running (assuming each task takes 1 CPU).

You can also do the limiting on the application side: submit 3 tasks and use ray.wait to wait for one to finish and then submit other one: https://docs.ray.io/en/latest/ray-core/patterns/limit-pending-tasks.html

Thanks for your answer. But PG might not able to do all what custom resource can do. I found this https://github.com/ray-project/ray/pull/13019, but none of them addresses I mean, what if I wanted my tasks to run in a sepecific node with concurrency-controlled. this situation.

SeanQuant commented 6 days ago

I wonder if i can recreate this function for my personal usage?(without rebuild ray.