Open willwhitney opened 9 years ago
the training loop is usually run as:
model:zeroGradParameters() criterion:forward(model:forward(...), target) model:backward(...) optimization
after every mini-batch, you need to zero the gradient buffers for correctness anyways. Initialization with zeros will likely hide bugs induced by forgetting to zero the gradBuffers every iteration...
Yup, I get that this is the standard form. But intuitively, you'd expect this one would work just as well:
criterion:forward(model:forward(...), target) model:backward(...) optimization model:zeroGradParameters()
This probably isn't that big a deal either way (I came across it randomly, not as a bug), but it seems like since all the other fields get initialized for you, this one would too.
this has come up in the past, several times. Maybe we should initialize gradWeight / gradBias with nans.
I had assumed that zeros was the case and just so happened to write an optimisation loop the latter way around, so +1 for initialising with NaNs (by the reasoning you gave above).
Is there a reason grad params don't start zeroed when a module is initialized? This seems super dangerous, and since initialization only happens once, it's not like it's a big performance hit to zero them.