Closed dongjoon-hyun closed 8 years ago
In fact, @beomyeol and I had quite a few discussions on separating 'batchSize' and 'pushPeriod' from each other. The PR for #69 will probably use only 'batchSize', but we definitely have to consider the difference of 'batchSize' and 'pushPeriod' later on.
Yep. Current batch_size
is n_{push}
. Let me explain. fetch
and push
are used to reduce communication costs. We use batch_size
in a wrong way in the sense of traditional meaning. What I mean is here.
for (int i = 0; i < maxIterations; ++i) {
for (final Pair<Pair<INDArray, Integer>, Boolean> data : dataSet) {
final INDArray input = data.getFirst().getFirst();
final int label = data.getFirst().getSecond();
final boolean isValidation = data.getSecond();
if (isValidation) {
validator.validate(input, label);
} else {
neuralNetwork.train(input, label);
}
}
}
This code implements SGD without random shuffle phase. For mini-batch SGD, we should average of result of batch_size
items in the data set. As we can see in the following code, push
does not do that, too.
public void push(final List<INDArray> activations, final List<INDArray> gradients) {
for (int i = 0; i < deltaLayerParameters.length; ++i) {
final INDArray activation = activations.get(i).transpose();
assert activation.isColumnVector();
deltaLayerParameters[i].getWeightParam().addi(activation.mmul(gradients.get(i)));
deltaLayerParameters[i].getBiasParam().addi(gradients.get(i));
}
++numUpdate;
}
We can implement real batch_size
correctly later. Currently, we can just replace batch_size
into n_{push}
.
@dongjoon-hyun I think this issue is needed for neural network to work correctly. PR #112 does not cover this issue, but we can address this issue after #112 is merged. If you don't mind, could we please reopen this issue?
Sure! Please reopen and assign to yourself. :) Thank you!
And, assign your milestone, too.
@dongjoon-hyun I changed assignee and added this to milestone, too. Thanks!
Thank YOU, @beomyeol .
Currently,
batch_size
is the only variable to control the communication with parameter servers. In the paper, there are two specific parameters for that purpose.This issue implements those two parameters.