ray-project / kuberay

A toolkit to run Ray applications on Kubernetes
Apache License 2.0
1.29k stars 412 forks source link

[Feature] Consider setting AUTOSCALER_CONSERVE_GPU_NODES by default in Ray autoscaler #2381

Open andrewsykim opened 2 months ago

andrewsykim commented 2 months ago

Search before asking

Description

See discussion in https://ray.slack.com/archives/C02GFQ82JPM/p1723646279659389

Use case

Tasks that only require CPUs should not trigger scale up of GPU worker groups. Ray autoscaler has an environment variable to prevent scale up of GPU nodes but it is not enabled by default.

Related issues

No response

Are you willing to submit a PR?

kevin85421 commented 2 months ago

It makes sense to me, but it seems to be a breaking change. Would it be better to disable it in Ray if we want to proceed?

dcela commented 2 months ago

I am confused, isn't it true by default?

https://github.com/ray-project/ray/blob/cc984cd1675892154fdace3d6adfce25e6ad33a1/python/ray/autoscaler/_private/constants.py#L31

andrewsykim commented 2 days ago

cc @ryanaoleary