Purpose of Gradient Scaling

low5545 commented 1 year ago

Gradient scaling is used in train_mlp_nerf.py, train_ngp_nerf.py and train_mlp_dnerf.py without autocasting. Moreover, gradient unscaling is not performed before optimizer.step(). Hence, there isn't any automatic mixed precision training here.

Thus, what's the purpose of the gradient scaling?

liruilong940607 commented 1 year ago

Hi it is just for scale up the loss, not for mixed precision.

The reason we do this is kinda tricky -- we find out the gradient of the network parameters when using tiny-cuda-nn can sometimes be super small -- like magnitude of 1e-17. And when using Adam as optimizer, it will compute grad ** 2 which will be flushed out due to the float precision limit.

Simply scale up the loss will scale up the gradient and avoid this issue. Also since the Adam is not sensitive to the scale of the gradient, the scaling up won't affect the optimization at all so don't need to scale it back.

low5545 commented 1 year ago

Thanks for the explanation!

nerfstudio-project / nerfacc

Purpose of Gradient Scaling #100