snuspl / parallax

A Tool for Automatic Parallelization of Deep Learning Training in Distributed Multi-GPU Environments.
Apache License 2.0
130 stars 35 forks source link

Decide how to handle 'Local Variable' #4

Open sj6077 opened 6 years ago

sj6077 commented 6 years ago

Things to Change

To implement RNN models, a user can represent the RNN hidden state with Variable to correctly pass it between multiple session runs. However, current parallax just places it in PS, and it is not replicated into workers even the developer specified that the variable is 'local variable' which should not be replicated. It leads to the incomplete convergence of the model. Furthermore, the fact that it is the user's responsibility to specify which variable should be replicated(GLOBAL_VARIABLE) and which should not (LOCAL_VARIABLE) also seems to be a problem, since parallax aims to support 'automatic parallelization'.

Current Behavior

Users have to distinguish between local and global variables by themselves.

Expected Behavior

Parallax automatically detects a local variable or helps a user with a warning

Failure Information (for bugs)

Failure Logs

How to Reproduce

Related Issues