We're running a ML server with Ray Serve on Google Cloud. Nowadays, the zone I'm using on GC is suffering "out of GPUs" issue, and on those times, we cannot scale up since they cannot launch any more virtual instances.
The problem is, in this situation, the Ray Autoscaler just tries to scale-up forever and we cannot know even this thing is happening.
Wanted Feature
I want to insert a hook function, so that I can send alerts to my team's workspace inside the autoscaler. And there's no option for these demand as far as I know.
Description
Situation
We're running a ML server with Ray Serve on Google Cloud. Nowadays, the zone I'm using on GC is suffering "out of GPUs" issue, and on those times, we cannot scale up since they cannot launch any more virtual instances. The problem is, in this situation, the Ray Autoscaler just tries to scale-up forever and we cannot know even this thing is happening.
Wanted Feature
I want to insert a hook function, so that I can send alerts to my team's workspace inside the autoscaler. And there's no option for these demand as far as I know.
Use case
No response