A lot of nan value in n_predictions

princeton-vl / RAFT-Stereo

MIT License

721 stars 139 forks source link

This can happen when training with mixed precision. Two solutions I've found to work:

1) Use full precision. This will use ~2x as much memory, though.

2) Clip large gradients midway through the backward pass. You can do this by wrapping convolutions with this function:

import torch
import torch.nn as nn
import torch.nn.functional as F

GRAD_CLIP = .01

class GradClip(torch.autograd.Function):

    @staticmethod
    def forward(ctx, x):
        return x

    @staticmethod
    def backward(ctx, grad_x):
        o = torch.zeros_like(grad_x)
        grad_x = torch.where(grad_x.abs()>GRAD_CLIP, o, grad_x)
        grad_x = torch.where(torch.isnan(grad_x), o, grad_x)
        return grad_x

princeton-vl / RAFT-Stereo

A lot of nan value in n_predictions #66