snuspl / cruise

Cruise: A Distributed Machine Learning Framework with Automatic System Configuration
Apache License 2.0
26 stars 2 forks source link

Decouple computation and communication by introducing worker-side model cache #1251

Closed wynot12 closed 6 years ago

wynot12 commented 7 years ago

In Dolphin, workers always pull model from servers in synchronous manner. Computation stalls until the model pull finishes. It wastes much of worker resources and slows down overall progress.

We need to decouple computation from communication by introducing worker-side model cache. Then computation can keep going regardless of communication.

The model cache will be refreshed in background. As a first try, we can update cache in best-effort manner. Later, we may support SSP.

yunseong commented 6 years ago

Part of #803

Let's separate out the issues about policies for refreshing cached objects.