ray-project / mobius

Mobius is an AI infrastructure platform for distributed online learning, including online sample processing, training and serving.
https://ray-project.github.io/mobius/
Apache License 2.0
88 stars 15 forks source link

[Scheduler] Job Master need non-inplacement scheduling. #65

Open ashione opened 6 months ago

ashione commented 6 months ago

In current pipeline first schedule strategy, we assign all of resources to each actor/worker instead of ray GCS management. It's good way to place all of actors once in single job cluster, but for multitenacy there are many job master actors so they can not manage shared resource of others. It would better to fit more elastic strategy and let gcs do more things in multitenacy mode.

ashione commented 6 months ago

@BalaBalaYi @clay4444 Could you take a look about this issue. Move internal implementaion to opensource version.