hughperkins / cltorch

An OpenCL backend for torch.
Other
289 stars 26 forks source link

cltorch fails gradInput bad argument error #42

Closed manjunaths closed 8 years ago

manjunaths commented 8 years ago

Hello, Yesterday convnet-benchmarks were running and today something is broken! :-)

I am getting this error. I completely did a fresh install of convnet-benchmarks, torch, cltorch and clnn and I am still unable to fix the error.

Please help.

[manju@anders-15 cltorch]$ th imagenet_winners/benchmark.lua libthclnn_searchpath /usr2/manju/Moonshot/DeepLearning/opencl/torch/install/lib/lua/5.1/libTHCLNN.so Running on device: Intel(R) HD Graphics Using Intel(R) Corporation , OpenCL platform: Intel(R) OpenCL Using OpenCL device: Intel(R) HD Graphics ModelType: OverFeat[fast] Kernels: clnn Input shape: 128x3x231x231 Apply_1t_1s0pt-2_*out = val1 build log: fcl build 1 succeeded. fcl build 2 succeeded. bcl build succeeded.

SpatialConvolutionMM.cl build log: fcl build 1 succeeded. fcl build 2 succeeded. bcl build succeeded.

Apply_1t_0s0pt-2__out = (_out > 0) ? *out : 0 build log: fcl build 1 succeeded. fcl build 2 succeeded. bcl build succeeded.

/tmp/luarocks_clnn-scm-1-314/clnn/SpatialMaxPooling.cpp build log: fcl build 1 succeeded. fcl build 2 succeeded. bcl build succeeded.

Apply_2t_0s0pt-2-2out = ( in1 > 0) ? *in1 : 1e-06f build log: fcl build 1 succeeded. fcl build 2 succeeded. bcl build succeeded.

Apply_3t_0s0pt-2-2-2__out = (_in1 > 0) ? *in2 : 0.0f build log: fcl build 1 succeeded. fcl build 2 succeeded. bcl build succeeded.

/tmp/luarocks_clnn-scm-1-314/clnn/SpatialMaxPooling.cpp build log: fcl build 1 succeeded. fcl build 2 succeeded. bcl build succeeded.

SpatialConvolutionMM.cl build log: fcl build 1 succeeded. fcl build 2 succeeded. bcl build succeeded.

/usr2/manju/Moonshot/DeepLearning/opencl/torch/install/bin/lua: .../install/share/lua/5.1/clnn/SpatialConvolutionMM.lua:21: bad argument #1 (field gradInput does not exist) stack traceback: C: in function 'SpatialConvolutionMM_updateGradInput' .../install/share/lua/5.1/clnn/SpatialConvolutionMM.lua:21: in function 'updateGradInput' ...opencl/torch/install/share/lua/5.1/nn/Sequential.lua:58: in function 'updateGradInput' ...opencl/torch/install/share/lua/5.1/nn/Sequential.lua:58: in function 'updateGradInput' imagenet_winners/benchmark.lua:45: in main chunk C: in function 'dofile' .../torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk

hughperkins commented 8 years ago

Line 21. Hmmm. Can you re-update clnn (luarocks install clnn), and reconfirm the problem still exists please? If it still exists, please can you provide lines 18-24 of /usr2/manju/Moonshot/DeepLearning/opencl/torch/install/share/lua/5.1/clnn/SpatialConvolutionMM.lua please?

manjunaths commented 8 years ago

Yes, the problem still exists...

/usr2/manju/Moonshot/DeepLearning/opencl/torch/install/bin/lua: .../install/share/lua/5.1/clnn/SpatialConvolutionMM.lua:21: bad argument #1 (field gradInput does not exist) stack traceback: C: in function 'SpatialConvolutionMM_updateGradInput' .../install/share/lua/5.1/clnn/SpatialConvolutionMM.lua:21: in function 'updateGradInput' ...opencl/torch/install/share/lua/5.1/nn/Sequential.lua:58: in function 'updateGradInput' ...opencl/torch/install/share/lua/5.1/nn/Sequential.lua:58: in function 'updateGradInput' imagenet_winners/benchmark.lua:45: in main chunk C: in function 'dofile' .../torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk

The lines 18-24 are:

 18       return self:baseUpdateGradInput(input, gradOutput)
 19    end
 20
 21    return input.nn.SpatialConvolutionMM_updateGradInput(self, input, gradOutput)
 22 end
 23
 24 function nn.SpatialConvolutionMM:accGradParameters(input, gradOutput, scale)
hughperkins commented 8 years ago

Hmmm... I wonder... am I accidetnally trying to run the cpu functions? :-P Ok, I shall ponder a bit...

hughperkins commented 8 years ago

Hmmm, tests seem to be really running the gpu methods, on my machine, with very latest nn installed. Can you just confirm that the tests pass ok please? ie luajit -l clnn -e 'clnn.test()'.

hughperkins commented 8 years ago

Hmmmm, I get the same error message actually.

hughperkins commented 8 years ago

Well... actually .... slightly different:

$ luajit imagenet_winners/benchmark.lua 
libthclnn_searchpath    /home/ubuntu/torch/install/lib/lua/5.1/libTHCLNN.so
Running on device: GRID K520
Using NVIDIA Corporation , OpenCL platform: NVIDIA CUDA
Using OpenCL device: GRID K520
ModelType: OverFeat[fast]       Kernels: clnn   Input shape: 4x3x231x231
warming up
luajit: /home/ubuntu/torch/install/share/lua/5.1/nn/Sequential.lua:58: bad argument #1 (field gradInput does not exist)
stack traceback:
        [C]: in function 'updateGradInput'
        /home/ubuntu/torch/install/share/lua/5.1/nn/Sequential.lua:58: in function 'updateGradInput'
        /home/ubuntu/torch/install/share/lua/5.1/nn/Sequential.lua:58: in function 'updateGradInput'
        imagenet_winners/benchmark.lua:47: in main chunk
        [C]: at 0x00406670
manjunaths commented 8 years ago

Hello, Sorry to be a bother, but any updates on this ?

hughperkins commented 8 years ago

You're right. I should fix this... please feel free to ping me once a day, till I fix it. Tomorrow is good, since its saturday :-)

manjunaths commented 8 years ago

I don't know if this is helpful. But I had a cltorch tree from 15 Dec 2015 and that doesn't have this issue.

hughperkins commented 8 years ago

I don't know if this is helpful. But I had a cltorch tree from 15 Dec 2015 and that doesn't have this issue.

yes, its something to do wit hthe THNN changes.

hughperkins commented 8 years ago

Fixed in https://github.com/hughperkins/clnn/commit/9bfe115b969796334c924e27ac1544d0576a10fe , I think. Try now? (Edit: note, you'll need to reinstall clnn, ie luarocks install clnn)