Closed enduringstack closed 4 years ago
您好,在supernet training中,看到关于梯度收集这块做了优化,如下:
for p in model.parameters(): if p.grad is not None and p.grad.sum() == 0: p.grad = None
请问这个原理是什么?
After loss.backward(), the grad which is none will be reset to 0, so we set them back.
您好,在supernet training中,看到关于梯度收集这块做了优化,如下:
请问这个原理是什么?