Open YJYJLee opened 2 years ago
I do not try amp; If you want to save GPU memory, it is better to try the torch.utils.checkpoint
.
Hello,
I had the same error and needed to adjust the eps
parameter of Adam. See reference issue. I am using Spconv-v2.1.x. Likely this is caused because spconv is somewhat independent from PyTorch.
If it still does not work try the higher spconv version (I have forked a modified implementation)
Hello,
I am trying to train Cylinder3D with mixed precision, so I added
torch.cuda.amp
code to the source code. However, I am getting NaN value due to overflow as soon as I start training. I detected NaN values in forward pass, and it is propagated in the backward pass which is causing loss to be also NaN.Is it possible to train Cylinder3D with fp16? Is there any solution for this? Thanks!