Open Superskyyy opened 4 months ago
similar issue https://github.com/ray-project/ray/issues/45940 Maybe we can decouple in this way so that we can achieve persist storage and the dashboard can control the tasks.
Share we have a abstraction layer in front of DataBase? So that, different DB solution can be used. @anyscalesam , @Bye-legumes I heard ByteDance had some solution already, kindly share with me if you have. Thanks a lot!
let's grab time to chat more about this cc @alanwguo
UPDATE: focus on getting Export API working first which is the natural pre-req to this. REP in progress with @MissiontoMars @nikitavemuri
Description
With Ray starting to support the virtual cluster (vCluster) concept and we are seeing advanced multi-cluster per user setups, the Ray dashboard components should not be bound to a single Ray cluster's lifetime anymore, since it makes multi-tenancy sharing and telemetry data persistence complex to implement. Plus that the dashboard would go down together if the head node goes down (fate-sharing), making it difficult to backtrack what happened (and what was executing) during a major incident. @liuxsh9 @Bye-legumes @nemo9cby
Use case
Doing so will bring below benefits: