Open rebecca0011 opened 1 year ago
This can happen when training with mixed precision. Two solutions I've found to work:
1) Use full precision. This will use ~2x as much memory, though.
2) Clip large gradients midway through the backward pass. You can do this by wrapping convolutions with this function:
import torch
import torch.nn as nn
import torch.nn.functional as F
GRAD_CLIP = .01
class GradClip(torch.autograd.Function):
@staticmethod
def forward(ctx, x):
return x
@staticmethod
def backward(ctx, grad_x):
o = torch.zeros_like(grad_x)
grad_x = torch.where(grad_x.abs()>GRAD_CLIP, o, grad_x)
grad_x = torch.where(torch.isnan(grad_x), o, grad_x)
return grad_x
I use the rectified YAV images to test the model, and got this error information:
Traceback (most recent call last): File "/home/rc/StereoMatching/RAFT-Stereo/train_stereo.py", line 256, in <module> train(args) File "/home/rc/StereoMatching/RAFT-Stereo/train_stereo.py", line 167, in train loss, metrics = sequence_loss(flow_predictions, flow, valid) File "/home/rc/StereoMatching/RAFT-Stereo/train_stereo.py", line 50, in sequence_loss assert not torch.isnan(flow_preds[i]).any() and not torch.isinf(flow_preds[i]).any() AssertionError
I debug the progrom and found a lot of nan value in n_predictions. Could you plz give me some advice?