xinge008 / Cylinder3D

Rank 1st in the leaderboard of SemanticKITTI semantic segmentation (both single-scan and multi-scan) (Nov. 2020) (CVPR2021 Oral)
Apache License 2.0
858 stars 180 forks source link

Is it possible to train Cylinder3D with mixed precision? #130

Open YJYJLee opened 2 years ago

YJYJLee commented 2 years ago

Hello,

I am trying to train Cylinder3D with mixed precision, so I added torch.cuda.amp code to the source code. However, I am getting NaN value due to overflow as soon as I start training. I detected NaN values in forward pass, and it is propagated in the backward pass which is causing loss to be also NaN.

Is it possible to train Cylinder3D with fp16? Is there any solution for this? Thanks!

xinge008 commented 2 years ago

I do not try amp; If you want to save GPU memory, it is better to try the torch.utils.checkpoint.

L-Reichardt commented 2 years ago

Hello,

I had the same error and needed to adjust the eps parameter of Adam. See reference issue. I am using Spconv-v2.1.x. Likely this is caused because spconv is somewhat independent from PyTorch.

If it still does not work try the higher spconv version (I have forked a modified implementation)