Open zhaocaibei123 opened 1 year ago
@ray-project/rayfed-dev CC for more discussions on the risks.
Hi @zhaocaibei123 Are you willing to contribute this feature?
It seems that we should support the infinite retries for fed.init()
like:
fed.init(infinite_retry=True)
I don't have any concrete proposal yet~
Currently, it's hard to control the process due to the asymmetrical workloads. So let's propose a global barrier
global_sync
to make sure all parties are in here.