Implement n_{fetch} and n_{push}

dongjoon-hyun commented 9 years ago

Currently, batch_size is the only variable to control the communication with parameter servers. In the paper, there are two specific parameters for that purpose.

It is possible to reduce the communication overhead of Downpour SGD by limiting each model replica to request updated parameters only every n_{fetch} steps and send updated gradient values only every n_{push} steps (where n_{fetch} might not be equal to n_{push})

n_{fetch}: The period to fetch the model.

n_{push}: The period to fetch the model.

This issue implements those two parameters.

jsjason commented 9 years ago

In fact, @beomyeol and I had quite a few discussions on separating 'batchSize' and 'pushPeriod' from each other. The PR for #69 will probably use only 'batchSize', but we definitely have to consider the difference of 'batchSize' and 'pushPeriod' later on.

dongjoon-hyun commented 9 years ago

Yep. Current batch_size is n_{push}. Let me explain. fetch and push are used to reduce communication costs. We use batch_size in a wrong way in the sense of traditional meaning. What I mean is here.

for (int i = 0; i < maxIterations; ++i) {
      for (final Pair<Pair<INDArray, Integer>, Boolean> data : dataSet) {
        final INDArray input = data.getFirst().getFirst();
        final int label = data.getFirst().getSecond();
        final boolean isValidation = data.getSecond();
        if (isValidation) {
          validator.validate(input, label);
        } else {
          neuralNetwork.train(input, label);
        }
      }
}

This code implements SGD without random shuffle phase. For mini-batch SGD, we should average of result of batch_size items in the data set. As we can see in the following code, push does not do that, too.

  public void push(final List<INDArray> activations, final List<INDArray> gradients) {
    for (int i = 0; i < deltaLayerParameters.length; ++i) {
      final INDArray activation = activations.get(i).transpose();
      assert activation.isColumnVector();
      deltaLayerParameters[i].getWeightParam().addi(activation.mmul(gradients.get(i)));
      deltaLayerParameters[i].getBiasParam().addi(gradients.get(i));
    }
    ++numUpdate;
  }

We can implement real batch_size correctly later. Currently, we can just replace batch_size into n_{push}.

beomyeol commented 9 years ago

@dongjoon-hyun I think this issue is needed for neural network to work correctly. PR #112 does not cover this issue, but we can address this issue after #112 is merged. If you don't mind, could we please reopen this issue?

dongjoon-hyun commented 9 years ago

Sure! Please reopen and assign to yourself. :) Thank you!

dongjoon-hyun commented 9 years ago

And, assign your milestone, too.

beomyeol commented 9 years ago

@dongjoon-hyun I changed assignee and added this to milestone, too. Thanks!

dongjoon-hyun commented 9 years ago

Thank YOU, @beomyeol .

snuspl / dolphin

Implement n_{fetch} and n_{push} #92