Closed bturetzky closed 8 years ago
Finally got back to this. Adding ClassNLLCriterionMultipleTarget back to the test suite reproduces all the errors for the SpatialConvolutionMM tests.
luajit -l clnn -e "clnn.test{'SpatialConvolutionMM_forward_single','SpatialConvolutionMM_forward_single_vgglayer13','SpatialConvolutionMM_forward_single_padded','SpatialConvolutionMM_forward_batch','SpatialConvolutionMM_backward_single','SpatialConvolutionMM_backward_batch','ClassNLLCriterionMultipleTarget'}"
Completed 0 asserts in 7 tests with 7 errors
The others also appear to be false positives:
luajit -l clnn -e "clnn.test{'Sqrt_zero'}"
libthclnn_searchpath /home/bowen/torch/install/lib/lua/5.1/libTHCLNN.so
Running 1 tests
| ==> Sqrt_zeroUsing NVIDIA Corporation , OpenCL platform: NVIDIA CUDA
Using OpenCL device: GeForce 8400GS
_ ==> Done
Completed 2 asserts in 1 tests with 0 errors
--------------------------------------------------------------------------------
$ luajit -l clnn -e "clnn.test{'Tanh_transposed'}"
libthclnn_searchpath /home/bowen/torch/install/lib/lua/5.1/libTHCLNN.so
Running 1 tests
| ==> Tanh_transposedUsing NVIDIA Corporation , OpenCL platform: NVIDIA CUDA
Using OpenCL device: GeForce 8400GS
_ ==> Done
Completed 2 asserts in 1 tests with 0 errors
--------------------------------------------------------------------------------
$ luajit -l clnn -e "clnn.test{'Threshold_transposed'}"
libthclnn_searchpath /home/bowen/torch/install/lib/lua/5.1/libTHCLNN.so
Running 1 tests
| ==> Threshold_transposedUsing NVIDIA Corporation , OpenCL platform: NVIDIA CUDA
Using OpenCL device: GeForce 8400GS
_ ==> Done
Completed 2 asserts in 1 tests with 0 errors
--------------------------------------------------------------------------------
Ah, good info! Thanks! That's the test that is running out of memory, right?
Correct, well... "Memory object allocation failure, code -4" anyway, not sure if its necessarily oom.
Ok. Seems like that test allocates a huugggee chunk of memory, see line 744 of test.lua. I dont remember why I thought that was a good idea. Dont suppose... if you get a moment, do you mind tweaking the numbers down a bit, until you find a pair of numbers which dont give the error? You'll need to check out the repo first, and build from the repo each time.
To check out the repo:
git clone https://github.com/hughperkins/clnn.git
cd clnn
To build from the repo:
luarocks rocks/clnn-scm-1.rockspec
Actually... I dont like the random
bit. Let's replace line 744 with:
local size = 3000
... and then just reduce this number, until the test passes for you.
Hmm, not sure about that build command, didn't work for me, but editing the test.lua file in torch/install/share/lua/5.1/clnn gave me success:
luajit -l clnn -e "clnn.test{'ClassNLLCriterionMultipleTarget'}"
libthclnn_searchpath /home/bowen/torch/install/lib/lua/5.1/libTHCLNN.so
Running 1 tests
| ==> ClassNLLCriterionMultipleTargetUsing NVIDIA Corporation , OpenCL platform: NVIDIA CUDA
Using OpenCL device: GeForce 8400GS
_ ==> Done
Completed 2 asserts in 1 tests with 0 errors
--------------------------------------------------------------------------------
Just to check I reran all the tests and I get Completed 98 asserts in 64 tests with 0 errors
now. Seems my old card can't handle your tests :-P
Hmm, not sure about that build command, didn't work for me
Ah whoops, build command should be luarocks make rocks/clnn-scm-1.rockspec
, I missed out the make
, by accident.
editing the test.lua file in torch/install/share/lua/5.1/clnn gave me success
Ok. What did you put at line 744, so that it passes? local size = 3000
?
Yep, 3000.
Cool. Will update.
ok. Updated in 673ed18
Awesome, thanks for all the help.
Cool. Glad it worked out :-)
Hello,
I've been trying to run https://github.com/karpathy/char-rnn on an D2700 atom computer with a GeForce 8400 GS and I'm having problems with clnn specifically. Incidentally, I have been able to run the char-rnn project on the CPU, just not the GPU.
cltorch.test() passes, EasyCL tests pass. nn.test() passes. I've included the clnn test output.
From the output it seemed that maybe I had some sort of mismatch in the chain of dependent software but I'm new to torch/lua/opencl so I'm pretty lost there. Alternatively, I thought maybe this card might just be too old (OpenCL 1.1?) for some of the things that clnn is doing? Any help would be appreciated. I'm running with Nvidia's 340.96 driver as thats the one I could get working. Running on a clean install of Ubuntu 15.10
clnntest.txt