torch / optim

A numeric optimization package for Torch.
Other
197 stars 154 forks source link

Checking gradients on GPU? #146

Closed juesato closed 7 years ago

juesato commented 7 years ago

This seems way too basic, but I've been looking at this for several hours and can't see a way around this:

I have a basic script which is creating a network and checking the gradient. When run on the CPU using double precision floats, the gradient check is good. On GPU, because CudaTensor objects use single precision floats, the analytic and numerical gradients don't match. This issue can also be replicated by setting the Tensor type on CPU to be FloatTensor. I feel like I have to be messing something up, but really don't see anything.

Here's the script:

require 'nn'
require 'optim'
require 'cunn'

local inputSize = 10
local outputSize = 10
local batchSize = 5

function check_gpu()
  -- First we do everything on CPU

  -- If you uncomment this line, the CPU estimates will be inconsistent too
  -- torch.setdefaulttensortype('torch.FloatTensor') 

  local net = nn.Sequential()
    -- :add(nn.Sigmoid())
    :add(nn.Linear(inputSize, outputSize))
  local w,dw = net:getParameters()
  local inp = torch.randn(batchSize, inputSize)
  local tgt = torch.zeros(batchSize)
  for i=1,batchSize do
    local i1 = torch.random(1, inputSize)
    tgt[i] = i1
  end

  local crit = nn.CrossEntropyCriterion()
  crit.sizeAverage = false

  local feval = function(x)
    if x ~= w then w:copy(x) end
    local out = net:forward(inp)
    local ce = crit:forward(out, tgt)
    local gradOutput = crit:backward(out, tgt)
    net:backward(inp, gradOutput)

    return ce, dw
  end

  local diff_cpu = optim.checkgrad(feval, w, 1e-4)
  print ('on cpu', diff_cpu)

  -- Then we do everything on GPU
  net:cuda()
  -- inp = inp:type('torch.CudaDoubleTensor')
  -- tgt = tgt:type('torch.CudaDoubleTensor')
  inp = inp:cuda()
  tgt = tgt:cuda()
  w, dw = net:getParameters()
  dw:zero()
  crit:cuda()
  local diff_gpu = optim.checkgrad(feval, w, 1e-4)
  print ('on gpu now', diff_gpu)
end

check_gpu()

Here's some sample output:

on cpu  4.0399252194701e-10 
on gpu now  0.003014417524189   
soumith commented 7 years ago

this is reasonable to expect. On the CPU, gradients are checked at Double precision, but on GPU they are being checked at float precision, as you noted.

One cannot expect analytical gradients to match computed gradients at float precision.