[autoscaler] dynamically set `min_nodes` during interactive sessions

ray-project / ray

Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.

https://ray.io

Apache License 2.0

34.19k stars 5.8k forks source link

[autoscaler] dynamically set `min_nodes` during interactive sessions #47248

Open akdienes opened 3 months ago

akdienes commented 3 months ago

Description

I would like a way to ask my cluster to keep a bunch of nodes alive without having to pass a placement group to all my jobs; ideally I could interactively spin up a bunch of nodes and let them sit until I close the connection. if I reserve a big placement group then I have to pass that placement group around, otherwise all those resources will be considered unavailable. and if I set min_nodes in the config then they will be always alive whether or not I am actually using the cluster at that moment.

Use case

No response

0xinsanity commented 3 months ago

This would be extremely useful. Would love to know if this would be possible

anyscalesam commented 3 months ago

cc @kevin85421

kevin85421 commented 3 months ago

Do you use KubeRay? In KubeRay, you can directly edit the min / max replicas of worker groups.

akdienes commented 3 months ago

I do not use KubeRay, just the vanilla autoscaler

kevin85421 commented 3 months ago

I would recommend using KubeRay instead if you are able to launch a K8s cluster.

akdienes commented 3 months ago

that sounds like quite a complicated change to make to my existing (working) setup. also, I'm not sure how it solves the problem? If I'm understanding correctly I'd still have to make changes to min_replicas in a config file and redeploy --- but that's already possible by setting min_workers in my cluster config yaml and restarting the head node.

what I'm looking for is something in the python API that I can call during an interactive session to reserve some workers to keep alive; kind of like how creating a placement group does (except unlike reserving a placement group, those resources would not be blocked from taking on tasks from the global pool)

kevin85421 commented 3 months ago

If I'm understanding correctly I'd still have to make changes to min_replicas in a config file and redeploy --- but that's already possible by setting min_workers in my cluster config yaml and restarting the head node.

You don't need to restart the head node in KubeRay. The config change can be detected dynamically.

what I'm looking for is something in the python API that I can call during an interactive session to reserve some workers to keep alive;

Maybe this developer API can fulfill your requirements, but this API is designed for Ray library developers (e.g., Ray Train, Ray Serve, etc.) and advanced users.

https://docs.ray.io/en/latest/cluster/running-applications/autoscaling/reference.html#ray-autoscaler-sdk-request-resources

akdienes commented 3 months ago

looking at ray.autoscaler.sdk.request_resources the following questions come to mind:

will the autoscaler continue to scale past what I request when I create enough tasks?
when will the requested resources come down / be killed? upon inactivity like normal autoscaler, or until actively requesting 0?
if the autoscaler spins up more resources than I have explicitly requested, and then I am inactive for the timeout period, will it return back to the level I requested or all the way down to zero?