Closed mtk380 closed 2 years ago
We didn't meet the problem when running on TITAN Xp (12G memory). Do you change some hyperparameters such as number of sampled rays or points per ray in each batch?
I carefully checked the GPU memory finding that the number is rising steadily until OOM.
I didn't change the code, how can I solve this problem? thanks!
Which version of PyTorch do you use? We met the similar problem in other project when using PyTorch1.7 and solved it after switching to PyTorch1.6
That's the point!!! Thanks very much!
I use TITAN RTX(24g) for training, but CUDA out of memory still occurred when the step is about 2454.
eta: 0:05:38 epoch: 2 step: 2454 rgb_loss: 0.1126 psnr: 15.9341 depth_loss: 0.0696 joint_loss: 0.1813 cross_entropy_loss: 0.7693 eikonal_loss: 0.0659 loss: 0.5825 beta: 0.0238 theta: -0.0845 rgb_weight: 1.0000 depth_weight: 1.0000 depth_loss_clamp_weight: 0.5000 joint_weight: 0.0500 ce_weight: 0.5000 eikonal_weight: 0.1000 data: 0.0226 batch: 0.6272 lr: 0.000456 max_mem: 22170
What should I do then? thanks!