ValueError: Per sample gradient is not initialized. Not updated in backward pass?Need solution

          Yes. Sometimes you need large batch sizes to make your model converge, but the GPU memory might be too small to fit all the per-sample gradients. This is why we distinguish the two: the physical batch size is what your GPU can fit, but the actual batch size is what you need for your optimization. Typically, if you have batch size of 512 and physical batch size of 32, you will do forward/backward on physical batches of size 32, but optimizer.step() will do an actual step only once every 16 (=512/32) forward/backward.

Originally posted by @alexandresablayrolles in https://github.com/pytorch/opacus/issues/502#issuecomment-1243618518

pytorch / opacus

ValueError: Per sample gradient is not initialized. Not updated in backward pass?Need solution #561