Write tests for a simple network, there will be compared 2 losses values:
1) Calculated without gradients accumulating
2) Calculated with gradients accumulating
For do this test need to provide same data input to model and same weights in model (last can be done by flushing weights to file).
[Optional] Explore how BatchNorm works with gradients accumulating. There says, that it's a problem (but disscussion from pre-relase of PyTorch 1.0 version)
For solving this issue these steps needed:
enable_grads_acumulation(steps_num: int)
toTrainer
classFor do this test need to provide same data input to model and same weights in model (last can be done by flushing weights to file).