Open shenxiangzhuang opened 4 years ago
Agree, the inner for loop should be only for calculating accuracy, not for updating weights. This way is definitely not faster.
This is my version: https://github.com/mikulatomas/grokking-deep-learning/blob/master/mnist/mnist_batch_dropout_multi_layer_network.ipynb
Yes, the for loop is only for calculating accuracy. Check out chapter 9 last code example. I think its implemented well there.
Agree. When move those 5 lines out of inner loop, it runs faster than the previous version.
Before batch: alpha=0.005 I:349 Train-Error:0.1502 Train-Correct:0.982 Test-error:0.296 Test-Acc:0.8721 Time: 209.26
after batch: alpha=0.1 I:349 Train-Error:0.2124 Train-Correct:0.953 Test-error:0.285 Test-Acc:0.8777 Time: 46.89 alpha=0.5 I:349 Train-Error:0.1837 Train-Correct:0.962 Test-error:0.301 Test-Acc:0.8675 Time: 46.24
Hello
With reference to this code snippet, why do they divide by batch size on the following line?
layer_2_delta = (labels[batch_start:batch_end]-layer_2) /batch_size
I do not really see the need for that division. Might anyone explain why this division takes place at this point?
for k in range(batch_size):
correct_cnt += int(np.argmax(layer_2[k:k+1]) ==
np.argmax(labels[batch_start+k:batch_start+k+1]))
layer_2_delta = (labels[batch_start:batch_end]-layer_2) /batch_size
layer_1_delta = layer_2_delta.dot(weights_1_2.T)* relu2deriv(layer_1)
layer_1_delta *= dropout_mask
weights_1_2 += alpha * layer_1.T.dot(layer_2_delta)
weights_0_1 += alpha * layer_0.T.dot(layer_1_delta)
Hello
With reference to this code snippet, why do they divide by batch size on the following line?
layer_2_delta = (labels[batch_start:batch_end]-layer_2) /batch_size
I do not really see the need for that division. Might anyone explain why this division takes place at this point?for k in range(batch_size): correct_cnt += int(np.argmax(layer_2[k:k+1]) == np.argmax(labels[batch_start+k:batch_start+k+1])) layer_2_delta = (labels[batch_start:batch_end]-layer_2) /batch_size layer_1_delta = layer_2_delta.dot(weights_1_2.T)* relu2deriv(layer_1) layer_1_delta *= dropout_mask weights_1_2 += alpha * layer_1.T.dot(layer_2_delta) weights_0_1 += alpha * layer_0.T.dot(layer_1_delta)
Hello, maybe they divide by batch size because the lines which compute deltas and weights updates are within inner loop? As was mentioned above these 5 lines must be outside of inner loop.
Yes, the for loop is only for calculating accuracy. Check out chapter 9 last code example. I think its implemented well there.
Thanks a lot.
In chap8, the code of Batch gradient descent is confusing.
In short, I think the code should update the weights only x times where x equals the number of batch in each iteration rather that n times where n equals the number of training samples.