Open VadisettyRahul opened 2 weeks ago
@jjyao this is similar to the idea we talked about - a more flexible, configurable sched system.
Hi friends, this is also a feature needed by our team to perform adaptive SLA scheduling. I had discussed previously with @anyscalesam and there appears to be many requests from different teams for such a feature. As such I submitted a REP and also ran some experiments comparing power of 2 scheduling vs. the adaptive SLA one.
My colleague @Superskyyy suggested to move the discussion here because we want to show adaptivity can also be extended beyond hardware usages to SLA use cases, and an improvement at the Ray Core level can be enjoyed by top level frameworks like Ray Data/ Ray Serve.
Description
Ray currently relies on static configurations for task scheduling, limiting efficiency during dynamically changing workloads. Adding adaptive scaling would allow clusters to automatically expand or contract based on resource demands, improving both utilization and response times.
Proposed Solution:
1. Monitor Resource Usage:
2. Implement Auto-Scaling Logic:
3. Dynamic Task Assignment:
4. Testing & Validation:
Expected Outcome: This feature would enable clusters to dynamically respond to changing loads, improving resource efficiency and overall task execution speed.
Use case
No response