Open jovany-wang opened 1 year ago
@ray-project/rayfed-dev CC
Excepting single-controller and multi-controller mode, we also need to support simulation mode for launching thousands of ends.
For SIMULATION
mode, we might run actors in different parties in one process(one Ray worker process).
For SINGLE_CONTROLLER
mode, actors in different parties should be run in different Ray nodes.
For MULTI_CONTROLLER
mode, actors in different parties should be run in different Ray clusters.
I'm proposing that support running rayfed job in single-controller mode.
I'd like to propose 2 options on how we startup the single-controller cluster and how we connect to the cluster and run our jobs.
option 1
Add a new cli toolkit to start the cluster, it just wrapper the ray cli toolkit, for example:
A. running single-controller mode
And then, the job could be run in single controller mode automatically:
B. running multiple-controller mode
And then, you run the following script in 2 clusters:
option 2
No need to add a new toolkit, but we should tell users that add some extra arguments when starting up the Ray cluster. For example,
A. running single-controller mode
And then, add the extra mode info when
fed.init()
:A. running multiple-controller mode
And then, add the extra mode info when
fed.init()
(And we could ignore it if we provide a default value):