snuspl / cruise

Cruise: A Distributed Machine Learning Framework with Automatic System Configuration
Apache License 2.0
26 stars 2 forks source link

[CAY-1251] Introduce worker-side model cache #1252

Closed wynot12 closed 7 years ago

wynot12 commented 7 years ago

Resolves #1251

This PR decouples computation and communication in ML training by introducing worker-side model cache. Note that cache eviction/refresh policy should be improved. The current version simply refreshes cache with 10s interval.

Users can turn on this feature with-model_cache_enabled option.

wynot12 commented 7 years ago

image

Here's a graph that compares convergence performance between cache vs. no-cache. (Experiment setup: Optiplex cluster, NMF, Netflix 1x, 5 epoch)

wynot12 commented 7 years ago

1253 will resolve the fail.

wynot12 commented 7 years ago

@yunseong I'll update this PR to use guava loading cache soon.

wynot12 commented 7 years ago

In the above graph, integers attached at the end of labels (e.g., cache2, cache3) means multiple experiments with cache. Sorry for confusing.

wynot12 commented 7 years ago

@yunseong how about to merge this PR? Since cache-version implementation is separate from the original one and our default setting turns off caching, it does not conflict with our main work that will use non-cache version.

yunseong commented 7 years ago

Agreed. The PR looks good and I'm merging it. Thanks!