megvii-model / SinglePathOneShot

MIT License
259 stars 53 forks source link

SuperNet Training 代码优化 #7

Closed enduringstack closed 4 years ago

enduringstack commented 4 years ago

您好,在supernet training中,看到关于梯度收集这块做了优化,如下:

        for p in model.parameters():
            if p.grad is not None and p.grad.sum() == 0:
                p.grad = None

请问这个原理是什么?

ZichaoGuo commented 4 years ago

After loss.backward(), the grad which is none will be reset to 0, so we set them back.