SuperNet Training 代码优化

megvii-model / SinglePathOneShot

MIT License

259 stars 53 forks source link

Closed enduringstack closed 4 years ago

enduringstack commented 4 years ago

您好，在supernet training中，看到关于梯度收集这块做了优化，如下：

        for p in model.parameters():
            if p.grad is not None and p.grad.sum() == 0:
                p.grad = None

请问这个原理是什么？

ZichaoGuo commented 4 years ago

After loss.backward(), the grad which is none will be reset to 0, so we set them back.