Open Cyndilee opened 8 years ago
It works as you expect. Don't confuse two different functions: Brain.backward() and Trainer.train(). The backward function from deepqlearn.js doesn't update the network's weights, it just calls the train function, which does update the weights, but it does so in batch.
The following is my understanding about the backward function in deepqlearn.js.
If it is in learning process, after one forward path, it begins to do backward propagation. It does SGD with a batch size of N. Those N samples are randomly selecting from replay memory. The code seems to show that the weight matrix of the network is updated N times by calling the backward function once. But in my opinion, the SGD should only update the weight matrix once by averaging all weight updates of a batch size of N.
Please correct me if I have misunderstood sometime.