Closed jsjason closed 7 years ago
Instead of collecting metrics from user code (i.e., Trainer), we can move such code to 1) ParameterWorker to measure the push/pull cost 2) AsyncWorkerTask to measure the rest - local computation.
We can then compute only the user-defined metrics (e.g., loss, log-likelihood) in user code.
Currently, the metric gathering mechanism in
dolphin-async
is exposed to the user. The system should be intelligent enough to hide it and collect metrics in the background without making the user know.