hughperkins / clnn

OpenCL backend for Torch nn neural networks library
BSD 2-Clause "Simplified" License
126 stars 16 forks source link

Issue with using an older card or something in pre-reqs? #29

Closed bturetzky closed 8 years ago

bturetzky commented 8 years ago

Hello,

I've been trying to run https://github.com/karpathy/char-rnn on an D2700 atom computer with a GeForce 8400 GS and I'm having problems with clnn specifically. Incidentally, I have been able to run the char-rnn project on the CPU, just not the GPU.

cltorch.test() passes, EasyCL tests pass. nn.test() passes. I've included the clnn test output.

From the output it seemed that maybe I had some sort of mismatch in the chain of dependent software but I'm new to torch/lua/opencl so I'm pretty lost there. Alternatively, I thought maybe this card might just be too old (OpenCL 1.1?) for some of the things that clnn is doing? Any help would be appreciated. I'm running with Nvidia's 340.96 driver as thats the one I could get working. Running on a clean install of Ubuntu 15.10

clnntest.txt

hughperkins commented 8 years ago

Well... first thing to check is, is everything up to date? ie:

luarocks install nn
luarocks install nngraph
luarocks install cltorch
luarocks install clnn

I'm using nvidia driver 352.55, on ubuntu 14.04 by the way. Any reason why you're not using the newer drivers?

bturetzky commented 8 years ago

I wasn't able to get 352 to work. Also, when you go to nvidia.com and enter the 8400 GS they give 340.96 as the one to download so I used that (but from apt, not the download).

Everything should be current, its all a new install. I explicitly ran those commands above during setup except nngraph which I assume was done when installing torch iteself (also installed at the same time).

hughperkins commented 8 years ago

Hmmm... well... I cant think of anything that needs 1.2 actually. 1.1 should be enough. cltorch is using clblas 2.4, which is (mostly... :-P) opencl 1.1. It used to work on the nvidia drivers back in June, when they didnt support 1.2 yet.

I think you need to provide the actual error messages you are seeeing when you are running char-rnn. ideally, run with luajit rather than th. I know it wont. But you can hack char-rnn train.lua so that it will. Simply put hte folowing at the top of train.lua:

path = {}

function path.join(a,b)
  return a .. '/' .. b
end

function path.exists(a)
  print('exists', a)
  return true
end
bturetzky commented 8 years ago

Same error as one I see in the clnn.test() output:

bowen@ubuntu-opencl:~/char-rnn$ luajit train.lua -opencl 1
libthclnn_searchpath    /home/bowen/torch/install/lib/lua/5.1/libTHCLNN.so
using OpenCL on GPU 0...
exists  data/tinyshakespeare/vocab.t7
loading data files...
cutting off end of data so that the batches/sequences divide evenly
reshaping tensor...
data load done. Number of data batches in train: 423, val: 23, test: 0
vocab size: 65
exists  cv
creating an lstm with 2 layers
Using NVIDIA Corporation , OpenCL platform: NVIDIA CUDA
Using OpenCL device: GeForce 8400GS
setting forget gate biases to 1 in LSTM layer 1
setting forget gate biases to 1 in LSTM layer 2
number of parameters in the model: 240321
cloning rnn
cloning criterion
luajit: /home/bowen/torch/install/share/lua/5.1/nn/LogSoftMax.lua:4: attempt to call field 'LogSoftMax_updateOutput' (a nil value)
stack traceback:
        /home/bowen/torch/install/share/lua/5.1/nn/LogSoftMax.lua:4: in function 'func'
        /home/bowen/torch/install/share/lua/5.1/nngraph/gmodule.lua:252: in function 'neteval'
        /home/bowen/torch/install/share/lua/5.1/nngraph/gmodule.lua:287: in function 'forward'
        train.lua:280: in function 'opfunc'
        /home/bowen/torch/install/share/lua/5.1/optim/rmsprop.lua:32: in function 'rmsprop'
        train.lua:324: in main chunk
        [C]: at 0x00405ea0

Incidentally, I'm not able to run cltorch.setDevice(1) until I first call cltorch.getDeviceCount() so I had to add that to train.lua as well. I assumed this was related to whatever my underlying problem is.

hughperkins commented 8 years ago

Incidentally, I'm not able to run cltorch.setDevice(1) until I first call cltorch.getDeviceCount() so I had to add that to train.lua as well. I assumed this was related to whatever my underlying problem is.

Ah interesting. I think that's a bug I created last week. Probably entirely unrelated.

luajit: /home/bowen/torch/install/share/lua/5.1/nn/LogSoftMax.lua:4: attempt to call field 'LogSoftMax_updateOutput' (a nil value)

Interesting, this may be related to THNN work going on https://github.com/torch/nn/pull/549 in fact, I'm like 97% it is. I can probably reproduce the problem on my own computer, if I update my nn module. I will tr ythat.

hughperkins commented 8 years ago

Yes, these all fail for me:

_____*|_________________________________________________________  ==> ELU_forwar                                                                                _____**|________________________________________________________  ==> ELU_transp                                                                                _____***|_______________________________________________________  ==> LogSigmoid                                                                                _____****|______________________________________________________  ==> LogSigmoid                                                                                _____*****|_____________________________________________________  ==> LogSigmoid                                                                                _____******|____________________________________________________  ==> LogSoftMax                                                                                _____*******|___________________________________________________  ==> LogSoftMax                                                                                _____********|__________________________________________________  ==> LogSoftMax                                                                                _____*********|_________________________________________________  ==> LogSoftMax                                                                                _____**********|________________________________________________  ==> Sigmoid_ba                                                                                _____**********_|_______________________________________________  ==> Sigmoid_fo                                                                                _____**********__|______________________________________________  ==> Sigmoid_tr                                                                                _____**********___|___________
LogSoftMax_backward
 Function call failed 
/home/user/torch/install/share/lua/5.1/nn/LogSoftMax.lua:4: attempt to call field 'LogSoftMax_updateOutput' (a nil value)
stack traceback:
    /home/user/torch/install/share/lua/5.1/nn/LogSoftMax.lua:4: in function 'forward'
    ...user/torch/install/share/lua/5.1/clnn/testLogSoftMax.lua:60: in function 'v'
    /home/user/torch/install/share/lua/5.1/clnn/test.lua:2616: in function </home/user/torch/install/share/lua/5.1/clnn/test.lua:2614>
    [C]: in function 'xpcall'
    /home/user/torch/install/share/lua/5.1/torch/Tester.lua:115: in function 'pcall'
    /home/user/torch/install/share/lua/5.1/torch/Tester.lua:186: in function '_run'
    /home/user/torch/install/share/lua/5.1/torch/Tester.lua:161: in function 'run'
    /home/user/torch/install/share/lua/5.1/clnn/test.lua:2655: in function 'test'
    (command line):1: in main chunk
    [C]: at 0x00406670
bturetzky commented 8 years ago

Well, at least I'm not crazy

hughperkins commented 8 years ago

:-)

hughperkins commented 8 years ago

fixed LogSoftMax.lua. Can you reinstall latest clnn, and see what error(s) you get now? (I've just fixed one module, but several are broken :-( )

hughperkins commented 8 years ago

Hmmm, probably Sigmoid and/or LogSigmoid are both broken and need to, right?

bturetzky commented 8 years ago

I'm still seeing the same 20 failures that is in clnntest.txt in my first post...

To update you'd just reissue the luarocks install clnn command?

hughperkins commented 8 years ago

oh, I'm on the wrong branch :-P

Let me merge onto master branch :-P

By the way, all errors are fixed now (but in the wrong branch). Will let you know once merged to master.

hughperkins commented 8 years ago

merged to master. Try now?

bturetzky commented 8 years ago

Yeah, that cleared up all of the "attempt to call field" errors. I still have 10 errors:

Memory object allocation failure, code -4
____*_________________________******___________*_______*____*___  ==> Done

Completed 82 asserts in 64 tests with 10 errors

--------------------------------------------------------------------------------
ClassNLLCriterionMultipleTarget
 Function call failed
...n/torch/install/share/lua/5.1/clnn/ClassNLLCriterion.lua:41:
kernel source:
1: // OpenCL kernels....
2:
3: // expected templated values:
4: // dims (vector of unique dimension values)
5: // operation
6: // dim1
7: // dim2
8: // dim3
9: // ... dimD
10: // num_input_tensors
11: // include_scalar_input
12: //
13: // maybe should add:
14: // IndexType (hardcoded to int for now)
15: // MAX_CUTORCH_DIMS (hardcoded to 25 for now)
16:
17: // (Ported from cutorch's THCApply.cuh)
18:
19: // Maximum number of dimensions allowed for cutorch
20: // #define MAX_CUTORCH_DIMS 25
21:
22: // Enum that indicates whether tensor arguments are read/write or
23: // read-only
24: //enum TensorArgType { ReadWrite, ReadOnly };
25:
26:
27:
28: inline void op( global float *out
29:
30:
31:   , float val1
32:
33:
34: ) {
35:     *out = val1;
36: }
37:
38: kernel void
39: THClTensor_pointwiseApplyD(
40:
41:     int offset_1,
42:
43:
44:     global float*data_1,
45:
46:
47:    float val1,
48:
49:
50:    int totalElements) {
51:    int linearIndex = get_global_id(0);
52:    if(linearIndex < totalElements ) {
53:
54:
55:
56:
57:          int derived_offset_1 = linearIndex + offset_1;
58:
59:
60:
61:     op(
62:
63:
64:          &(data_1[derived_offset_1])
65:
66:
67:
68:       , val1
69:
70:
71:
72:     );
73:   }
74: }
75:
76:

Memory object allocation failure, code -4 at /tmp/luarocks_cltorch-scm-1-452/cltorch/cltorch/src/lib/THClKernels.cpp:165
stack traceback:
        [C]: in function 'zero'
        ...n/torch/install/share/lua/5.1/clnn/ClassNLLCriterion.lua:41: in function 'backward'
        /home/bowen/torch/install/share/lua/5.1/clnn/test.lua:763: in function 'v'
        /home/bowen/torch/install/share/lua/5.1/clnn/test.lua:2616: in function </home/bowen/torch/install/share/lua/5.1/clnn/test.lua:2614>
        [C]: in function 'xpcall'
        /home/bowen/torch/install/share/lua/5.1/torch/Tester.lua:115: in function 'pcall'
        /home/bowen/torch/install/share/lua/5.1/torch/Tester.lua:186: in function '_run'
        /home/bowen/torch/install/share/lua/5.1/torch/Tester.lua:161: in function 'run'
        /home/bowen/torch/install/share/lua/5.1/clnn/test.lua:2655: in function 'test'
        (command line):1: in main chunk
        [C]: at 0x00405ea0

--------------------------------------------------------------------------------
SpatialConvolutionMM_backward_batch
 Function call failed
.../install/share/lua/5.1/clnn/testSpatialConvolutionMM.lua:285: OpenCL error, code: -49 at /tmp/luarocks_cltorch-scm-1-452/cltorch/cltorch/src/lib/THClKernels.cpp:88
stack traceback:
        [C]: in function 'forward'
        .../install/share/lua/5.1/clnn/testSpatialConvolutionMM.lua:285: in function 'v'
        /home/bowen/torch/install/share/lua/5.1/clnn/test.lua:2616: in function </home/bowen/torch/install/share/lua/5.1/clnn/test.lua:2614>
        [C]: in function 'xpcall'
        /home/bowen/torch/install/share/lua/5.1/torch/Tester.lua:115: in function 'pcall'
        /home/bowen/torch/install/share/lua/5.1/torch/Tester.lua:186: in function '_run'
        /home/bowen/torch/install/share/lua/5.1/torch/Tester.lua:161: in function 'run'
        /home/bowen/torch/install/share/lua/5.1/clnn/test.lua:2655: in function 'test'
        (command line):1: in main chunk
        [C]: at 0x00405ea0

--------------------------------------------------------------------------------
SpatialConvolutionMM_backward_single
 Function call failed
.../install/share/lua/5.1/clnn/testSpatialConvolutionMM.lua:224: OpenCL error, code: -49 at /tmp/luarocks_cltorch-scm-1-452/cltorch/cltorch/src/lib/THClKernels.cpp:88
stack traceback:
        [C]: in function 'forward'
        .../install/share/lua/5.1/clnn/testSpatialConvolutionMM.lua:224: in function 'v'
        /home/bowen/torch/install/share/lua/5.1/clnn/test.lua:2616: in function </home/bowen/torch/install/share/lua/5.1/clnn/test.lua:2614>
        [C]: in function 'xpcall'
        /home/bowen/torch/install/share/lua/5.1/torch/Tester.lua:115: in function 'pcall'
        /home/bowen/torch/install/share/lua/5.1/torch/Tester.lua:186: in function '_run'
        /home/bowen/torch/install/share/lua/5.1/torch/Tester.lua:161: in function 'run'
        /home/bowen/torch/install/share/lua/5.1/clnn/test.lua:2655: in function 'test'
        (command line):1: in main chunk
        [C]: at 0x00405ea0

--------------------------------------------------------------------------------
SpatialConvolutionMM_forward_batch
 Function call failed
.../install/share/lua/5.1/clnn/testSpatialConvolutionMM.lua:174: OpenCL error, code: -49 at /tmp/luarocks_cltorch-scm-1-452/cltorch/cltorch/src/lib/THClKernels.cpp:88
stack traceback:
        [C]: in function 'forward'
        .../install/share/lua/5.1/clnn/testSpatialConvolutionMM.lua:174: in function 'v'
        /home/bowen/torch/install/share/lua/5.1/clnn/test.lua:2616: in function </home/bowen/torch/install/share/lua/5.1/clnn/test.lua:2614>
        [C]: in function 'xpcall'
        /home/bowen/torch/install/share/lua/5.1/torch/Tester.lua:115: in function 'pcall'
        /home/bowen/torch/install/share/lua/5.1/torch/Tester.lua:186: in function '_run'
        /home/bowen/torch/install/share/lua/5.1/torch/Tester.lua:161: in function 'run'
        /home/bowen/torch/install/share/lua/5.1/clnn/test.lua:2655: in function 'test'
        (command line):1: in main chunk
        [C]: at 0x00405ea0

--------------------------------------------------------------------------------
SpatialConvolutionMM_forward_single
 Function call failed
.../install/share/lua/5.1/clnn/testSpatialConvolutionMM.lua:40: OpenCL error, code: -49 at /tmp/luarocks_cltorch-scm-1-452/cltorch/cltorch/src/lib/THClKernels.cpp:88
stack traceback:
        [C]: in function 'forward'
        .../install/share/lua/5.1/clnn/testSpatialConvolutionMM.lua:40: in function 'v'
        /home/bowen/torch/install/share/lua/5.1/clnn/test.lua:2616: in function </home/bowen/torch/install/share/lua/5.1/clnn/test.lua:2614>
        [C]: in function 'xpcall'
        /home/bowen/torch/install/share/lua/5.1/torch/Tester.lua:115: in function 'pcall'
        /home/bowen/torch/install/share/lua/5.1/torch/Tester.lua:186: in function '_run'
        /home/bowen/torch/install/share/lua/5.1/torch/Tester.lua:161: in function 'run'
        /home/bowen/torch/install/share/lua/5.1/clnn/test.lua:2655: in function 'test'
        (command line):1: in main chunk
        [C]: at 0x00405ea0

--------------------------------------------------------------------------------
SpatialConvolutionMM_forward_single_padded
 Function call failed
.../install/share/lua/5.1/clnn/testSpatialConvolutionMM.lua:126: OpenCL error, code: -49 at /tmp/luarocks_cltorch-scm-1-452/cltorch/cltorch/src/lib/THClKernels.cpp:88
stack traceback:
        [C]: in function 'forward'
        .../install/share/lua/5.1/clnn/testSpatialConvolutionMM.lua:126: in function 'v'
        /home/bowen/torch/install/share/lua/5.1/clnn/test.lua:2616: in function </home/bowen/torch/install/share/lua/5.1/clnn/test.lua:2614>
        [C]: in function 'xpcall'
        /home/bowen/torch/install/share/lua/5.1/torch/Tester.lua:115: in function 'pcall'
        /home/bowen/torch/install/share/lua/5.1/torch/Tester.lua:186: in function '_run'
        /home/bowen/torch/install/share/lua/5.1/torch/Tester.lua:161: in function 'run'
        /home/bowen/torch/install/share/lua/5.1/clnn/test.lua:2655: in function 'test'
        (command line):1: in main chunk
        [C]: at 0x00405ea0

--------------------------------------------------------------------------------
SpatialConvolutionMM_forward_single_vgglayer13
 Function call failed
.../install/share/lua/5.1/clnn/testSpatialConvolutionMM.lua:83: OpenCL error, code: -49 at /tmp/luarocks_cltorch-scm-1-452/cltorch/cltorch/src/lib/THClKernels.cpp:88
stack traceback:
        [C]: in function 'forward'
        .../install/share/lua/5.1/clnn/testSpatialConvolutionMM.lua:83: in function 'v'
        /home/bowen/torch/install/share/lua/5.1/clnn/test.lua:2616: in function </home/bowen/torch/install/share/lua/5.1/clnn/test.lua:2614>
        [C]: in function 'xpcall'
        /home/bowen/torch/install/share/lua/5.1/torch/Tester.lua:115: in function 'pcall'
        /home/bowen/torch/install/share/lua/5.1/torch/Tester.lua:186: in function '_run'
        /home/bowen/torch/install/share/lua/5.1/torch/Tester.lua:161: in function 'run'
        /home/bowen/torch/install/share/lua/5.1/clnn/test.lua:2655: in function 'test'
        (command line):1: in main chunk
        [C]: at 0x00405ea0

--------------------------------------------------------------------------------
Sqrt_zero
 Function call failed
/home/bowen/torch/install/share/lua/5.1/clnn/test.lua:584: OpenCL error, code: -49 at /tmp/luarocks_cltorch-scm-1-452/cltorch/cltorch/src/lib/THClKernels.cpp:88
stack traceback:
        [C]: in function 'zero'
        /home/bowen/torch/install/share/lua/5.1/clnn/test.lua:584: in function 'v'
        /home/bowen/torch/install/share/lua/5.1/clnn/test.lua:2616: in function </home/bowen/torch/install/share/lua/5.1/clnn/test.lua:2614>
        [C]: in function 'xpcall'
        /home/bowen/torch/install/share/lua/5.1/torch/Tester.lua:115: in function 'pcall'
        /home/bowen/torch/install/share/lua/5.1/torch/Tester.lua:186: in function '_run'
        /home/bowen/torch/install/share/lua/5.1/torch/Tester.lua:161: in function 'run'
        /home/bowen/torch/install/share/lua/5.1/clnn/test.lua:2655: in function 'test'
        (command line):1: in main chunk
        [C]: at 0x00405ea0

--------------------------------------------------------------------------------
Tanh_transposed
 Function call failed
/home/bowen/torch/install/share/lua/5.1/clnn/Tanh.lua:11: OpenCL error, code: -49 at /tmp/luarocks_cltorch-scm-1-452/cltorch/cltorch/src/lib/THClKernels.cpp:88
stack traceback:
        [C]: in function 'zero'
        /home/bowen/torch/install/share/lua/5.1/clnn/Tanh.lua:11: in function 'updateGradInput'
        /home/bowen/torch/install/share/lua/5.1/nn/Module.lua:30: in function 'backward'
        /home/bowen/torch/install/share/lua/5.1/clnn/test.lua:73: in function 'pointwise_transposed'
        /home/bowen/torch/install/share/lua/5.1/clnn/test.lua:808: in function 'v'
        /home/bowen/torch/install/share/lua/5.1/clnn/test.lua:2616: in function </home/bowen/torch/install/share/lua/5.1/clnn/test.lua:2614>
        [C]: in function 'xpcall'
        /home/bowen/torch/install/share/lua/5.1/torch/Tester.lua:115: in function 'pcall'
        /home/bowen/torch/install/share/lua/5.1/torch/Tester.lua:186: in function '_run'
        /home/bowen/torch/install/share/lua/5.1/torch/Tester.lua:161: in function 'run'
        /home/bowen/torch/install/share/lua/5.1/clnn/test.lua:2655: in function 'test'
        (command line):1: in main chunk
        [C]: at 0x00405ea0

--------------------------------------------------------------------------------
Threshold_transposed
 Function call failed
/home/bowen/torch/install/share/lua/5.1/clnn/Threshold.lua:31: OpenCL error, code: -49 at /tmp/luarocks_cltorch-scm-1-452/cltorch/cltorch/src/lib/THClKernels.cpp:88
stack traceback:
        [C]: in function 'zero'
        /home/bowen/torch/install/share/lua/5.1/clnn/Threshold.lua:31: in function 'Threshold_updateGradInput'
        /home/bowen/torch/install/share/lua/5.1/nn/Threshold.lua:26: in function 'updateGradInput'
        /home/bowen/torch/install/share/lua/5.1/nn/Module.lua:30: in function 'backward'
        /home/bowen/torch/install/share/lua/5.1/clnn/test.lua:73: in function 'pointwise_transposed'
        /home/bowen/torch/install/share/lua/5.1/clnn/test.lua:511: in function 'v'
        /home/bowen/torch/install/share/lua/5.1/clnn/test.lua:2616: in function </home/bowen/torch/install/share/lua/5.1/clnn/test.lua:2614>
        [C]: in function 'xpcall'
        /home/bowen/torch/install/share/lua/5.1/torch/Tester.lua:115: in function 'pcall'
        /home/bowen/torch/install/share/lua/5.1/torch/Tester.lua:186: in function '_run'
        /home/bowen/torch/install/share/lua/5.1/torch/Tester.lua:161: in function 'run'
        /home/bowen/torch/install/share/lua/5.1/clnn/test.lua:2655: in function 'test'
        (command line):1: in main chunk
        [C]: at 0x00405ea0

--------------------------------------------------------------------------------
hughperkins commented 8 years ago

error code -49 "if arg_index is not a valid argument index". Hmmm... never seen this error before. At least, not that I remember. Hmmmm...

hughperkins commented 8 years ago

well, interestingly, if I run char-rnn on my own computer, I still get LogSoftMax errors, see below. Curious...

/home/user/torch/install/bin/luajit: /home/user/torch/install/share/lua/5.1/nn/LogSoftMax.lua:4: attempt to call field 'LogSoftMax_updateOutput' (a nil value)
stack traceback:
    /home/user/torch/install/share/lua/5.1/nn/LogSoftMax.lua:4: in function 'func'
hughperkins commented 8 years ago

(Oh, thats probably because it uses cuda by default. I need to add opencl 1 probably)

hughperkins commented 8 years ago

"Incidentally, I'm not able to run cltorch.setDevice(1) until I first call cltorch.getDeviceCount() so I had to add that to train.lua as well. I assumed this was related to whatever my underlying problem is." Hmmm, I have this error too. I need to fix this...

bturetzky commented 8 years ago

Yeah, the -49 made me think there was some header or something that clnn might be built against that is different on my system.

The memory allocation errors make me think the card might not work for those features.

hughperkins commented 8 years ago

Thats probably true. let me fix the errors I have first, check if char-rnn does/doesnt run on my system currently.

hughperkins commented 8 years ago

Issue with having to call getDeviceCount before calling setDevice is fixed now. By the way, quite surprising you managed to figure out that the combination of calling getDeviceCount first made setDevice work ok :-)

On my machine char-rnn now runs again:

1/21150 (epoch 0.002), train_loss = 4.19803723, grad/param norm = 5.1721e-01, time/batch = 2.3205s  
2/21150 (epoch 0.005), train_loss = 3.93712091, grad/param norm = 1.4679e+00, time/batch = 0.2862s  
3/21150 (epoch 0.007), train_loss = 3.43751776, grad/param norm = 9.5793e-01, time/batch = 0.2841s  
4/21150 (epoch 0.009), train_loss = 3.41289300, grad/param norm = 7.5153e-01, time/batch = 0.2839s  
5/21150 (epoch 0.012), train_loss = 3.33699637, grad/param norm = 6.9269e-01, time/batch = 0.2815s  
bturetzky commented 8 years ago

If at first one function doesn't work, try them all, lol.

hughperkins commented 8 years ago

For your case, 'memory allocation failure' is not a good sign. But should be circumventable. it's probably an issue with the entire dataset being loaded into memory, rather than some fundamental theoretical limitation.

For error code -49, hmmm...

hughperkins commented 8 years ago

If at first one function doesn't work, try them all, lol.

:-D

bturetzky commented 8 years ago

I'm watching nvidia-smi in another screen and it never goes above ~30/256 MiB in terms of memory on the card. I'm not sure if there are other places its going to attempt to allocate memory, however? Shouldn't be a problem in RAM.

bturetzky commented 8 years ago

and these are clnn.test() calls, which I assume aren't dealing with a very large dataset anyway?

hughperkins commented 8 years ago

Ah, hmmm, right. Anyway, I think I wil check -49 first. That sounds like the more fundamental limitation somehow.

hughperkins commented 8 years ago

For SpatialConvolutionMM, we are passing a whole bunch of arguments into the kernel:

  THClKernels k(state, kernel);
  k.in(num_kernels);
  k.in(im);
  k.in(height);
  k.in(width);
  k.in(ksize_h);
  k.in(ksize_w);
  k.in(pad_h);
  k.in(pad_w);
  k.in(stride_h);
  k.in(stride_w);
  k.in(height_col);
  k.in(width_col);
  k.out(col);

Seems plausible your card wont allow so many arguments. I think there should be some way to check this point.

hughperkins commented 8 years ago

So, on my machine, I do the following, get following output:

clinfo | grep arg
  Max number of images read arguments:       256
  Max number of images write arguments:      16
  Max size of kernel argument:           4352
  Max number of constant args:           9

What do you get if you run clinfo | grep arg?

bturetzky commented 8 years ago
clinfo | grep arg
    Max number of read image args                 128
    Max number of write image args                8
  Max number of constant args                     9
  Max size of kernel argument                     4352 (4.25KiB)
hughperkins commented 8 years ago

hmmm, looks teh same. (I'm not using image arguments, as far as I know).

bturetzky commented 8 years ago

but you're passing 14, you must be using them?

bturetzky commented 8 years ago

oh I see, the size

hughperkins commented 8 years ago

most (all?) of the arguments I'm passing are not constant arguments. And anyway, the max number of constant args, and maximum size, are the same for both our cards.

hughperkins commented 8 years ago

I think -49 means that the argument doesnt match what is in the kernel. Why you would be getting that, and I dont, is ... curious.

hughperkins commented 8 years ago

I think it'd be good to run in gdb if you can. The way I do this is, first create a script, which I call run_gdb.sh:

#!/bin/bash

gdb $1 -ex "catch throw" -ex "run $2 $3 $4 $5 $6 $7 $8 $9" 

Now, run train.lua using this, like:

rungdb.sh luajit train.lua -opencl 1

It should stop at the exception. Then type 'bt' (and enter), and paste the stack trace here.

hughperkins commented 8 years ago

(I'm kind of sleepy, it is 7am, and I've been up most of the night. I think I shall sleep for a bit :-) )

bturetzky commented 8 years ago

cool, good night, will look at the gdb stuff

bturetzky commented 8 years ago

oh hmm, apparently char-rnn runs now, it was hung up on those nn changes. The other errors I was only getting in the clnn.test suite.

bturetzky commented 8 years ago

huh, so it looks like I'm CPU bound when running on the GPU and it looks like its running with one thread vs using all 4 like it does without using the GPU so I actually get worse performance running with opencl than without.

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
12818 bowen     20   0  568456 150688  28156 R 100.1  7.4   0:52.46 luajit
12817 bowen     20   0   24856   3104   2672 R   0.3  0.2   0:00.19 top

...

th train.lua -opencl 1
libthclnn_searchpath    /home/bowen/torch/install/lib/lua/5.1/libTHCLNN.so
using OpenCL on GPU 0...
exists  data/tinyshakespeare/vocab.t7
loading data files...
cutting off end of data so that the batches/sequences divide evenly
reshaping tensor...
data load done. Number of data batches in train: 423, val: 23, test: 0
vocab size: 65
exists  cv
creating an lstm with 2 layers
Using NVIDIA Corporation , OpenCL platform: NVIDIA CUDA
Using OpenCL device: GeForce 8400GS
setting forget gate biases to 1 in LSTM layer 1
setting forget gate biases to 1 in LSTM layer 2
number of parameters in the model: 240321
cloning rnn
cloning criterion
1/21150 (epoch 0.002), train_loss = 4.19803722, grad/param norm = 5.1721e-01, time/batch = 4.0963s
2/21150 (epoch 0.005), train_loss = 3.93712094, grad/param norm = 1.4679e+00, time/batch = 3.0649s
3/21150 (epoch 0.007), train_loss = 3.43751777, grad/param norm = 9.5793e-01, time/batch = 3.0669s
4/21150 (epoch 0.009), train_loss = 3.41289306, grad/param norm = 7.5153e-01, time/batch = 3.0672s
5/21150 (epoch 0.012), train_loss = 3.33699642, grad/param norm = 6.9269e-01, time/batch = 3.0635s
6/21150 (epoch 0.014), train_loss = 3.37105609, grad/param norm = 5.2300e-01, time/batch = 3.0669s
7/21150 (epoch 0.017), train_loss = 3.36710170, grad/param norm = 4.3214e-01, time/batch = 3.0668s
8/21150 (epoch 0.019), train_loss = 3.33051407, grad/param norm = 3.9960e-01, time/batch = 3.0689s
9/21150 (epoch 0.021), train_loss = 3.29338823, grad/param norm = 3.8692e-01, time/batch = 3.0651s
10/21150 (epoch 0.024), train_loss = 3.38265341, grad/param norm = 3.5570e-01, time/batch = 3.0627s
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
12847 bowen     20   0  524896 308544   7620 R 380.4 15.2   1:49.08 luajit
12846 bowen     20   0   24856   3012   2584 R   0.3  0.1   0:00.14 top

...

th train.lua -gpuid -1
exists  data/tinyshakespeare/vocab.t7
loading data files...
cutting off end of data so that the batches/sequences divide evenly
reshaping tensor...
data load done. Number of data batches in train: 423, val: 23, test: 0
vocab size: 65
exists  cv
creating an lstm with 2 layers
setting forget gate biases to 1 in LSTM layer 1
setting forget gate biases to 1 in LSTM layer 2
number of parameters in the model: 240321
cloning rnn
cloning criterion
1/21150 (epoch 0.002), train_loss = 4.19803724, grad/param norm = 5.1721e-01, time/batch = 2.5358s
2/21150 (epoch 0.005), train_loss = 3.93712133, grad/param norm = 1.4679e+00, time/batch = 2.2016s
3/21150 (epoch 0.007), train_loss = 3.43764434, grad/param norm = 9.5800e-01, time/batch = 2.2196s
4/21150 (epoch 0.009), train_loss = 3.41313742, grad/param norm = 7.5143e-01, time/batch = 2.2028s
5/21150 (epoch 0.012), train_loss = 3.33707270, grad/param norm = 6.9269e-01, time/batch = 2.2358s
6/21150 (epoch 0.014), train_loss = 3.37127145, grad/param norm = 5.2318e-01, time/batch = 2.2283s
7/21150 (epoch 0.017), train_loss = 3.36724018, grad/param norm = 4.3217e-01, time/batch = 2.2274s
8/21150 (epoch 0.019), train_loss = 3.33067083, grad/param norm = 3.9964e-01, time/batch = 2.2169s
9/21150 (epoch 0.021), train_loss = 3.29356131, grad/param norm = 3.8693e-01, time/batch = 2.2267s
10/21150 (epoch 0.024), train_loss = 3.38283139, grad/param norm = 3.5561e-01, time/batch = 2.2289s
bturetzky commented 8 years ago
./rungdb.sh luajit ClassNLLCriterionMultipleTarget.lua
GNU gdb (Ubuntu 7.10-1ubuntu2) 7.10
Copyright (C) 2015 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from luajit...(no debugging symbols found)...done.
Catchpoint 1 (throw)
Starting program: /home/bowen/torch/install/bin/luajit ClassNLLCriterionMultipleTarget.lua
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
libthclnn_searchpath    /home/bowen/torch/install/lib/lua/5.1/libTHCLNN.so
Using NVIDIA Corporation , OpenCL platform: NVIDIA CUDA
Using OpenCL device: GeForce 8400GS
[New Thread 0x7fffcac9f700 (LWP 12949)]
[New Thread 0x7fffc989e700 (LWP 12950)]
[New Thread 0x7fffc909d700 (LWP 12951)]
[New Thread 0x7fffc889c700 (LWP 12952)]
[New Thread 0x7fffc3fff700 (LWP 12953)]
[New Thread 0x7fffc37fe700 (LWP 12954)]
Catchpoint 1 (exception thrown), 0x00007fffebdcdbcd in __cxa_throw () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
(gdb) bt
#0  0x00007fffebdcdbcd in __cxa_throw () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#1  0x00007fffec0d96b1 in CLKernel::run (this=0x817f70, ND=ND@entry=3, global_ws=global_ws@entry=0x7fffffffd6d0,
    local_ws=local_ws@entry=0x7fffffffd738) at /tmp/luarocks_cltorch-scm-1-452/cltorch/cltorch/src/EasyCL/CLKernel.cpp:258
#2  0x00007fffec33e40b in THClKernels::run (this=this@entry=0x7fffffffd900, grid=..., block=...)
    at /tmp/luarocks_cltorch-scm-1-452/cltorch/cltorch/src/lib/THClKernels.cpp:162
#3  0x00007fffec32208e in kernelLaunch_pointwiseApply<unsigned int> (state=state@entry=0x6aa920, grid=..., block=...,
    numTensors=numTensors@entry=1, dims=dims@entry=0x7fffffffdca0, infos=infos@entry=0x7fffffffdcb0,
    totalElements=totalElements@entry=21058921, op=op@entry=0x7fffffffde50, operationString="*out = val1")
    at /tmp/luarocks_cltorch-scm-1-452/cltorch/cltorch/src/lib/THClApply.cpp:140
#4  0x00007fffec3206fa in THClTensor_pointwiseApply (state=state@entry=0x6aa920, numTensors=numTensors@entry=1,
    tensors=tensors@entry=0x7fffffffde10, op=op@entry=0x7fffffffde50, operationString="*out = val1", types=types@entry=0x7fffffffde00)
    at /tmp/luarocks_cltorch-scm-1-452/cltorch/cltorch/src/lib/THClApply.cpp:208
#5  0x00007fffec320995 in THClTensor_pointwiseApply1 (state=state@entry=0x6aa920, a=a@entry=0x7e0660, op=op@entry=0x7fffffffde50,
    aType=aType@entry=ReadWrite) at /tmp/luarocks_cltorch-scm-1-452/cltorch/cltorch/src/lib/THClApply.cpp:250
#6  0x00007fffec31575d in THClTensor_zero (state=state@entry=0x6aa920, self_=self_@entry=0x7e0660)
    at /tmp/luarocks_cltorch-scm-1-452/cltorch/cltorch/src/lib/THClTensorMath.cpp:77
#7  0x00007fffec57fdf8 in wrapper_zero (L=0x40000378) at /tmp/luarocks_cltorch-scm-1-452/cltorch/cltorch/build/TensorMath.c:5286
#8  0x00000000004816ba in lj_BC_FUNCC ()
#9  0x0000000000470d3d in lua_pcall ()
#10 0x000000000040677f in pmain ()
#11 0x00000000004816ba in lj_BC_FUNCC ()
#12 0x0000000000470db7 in lua_cpcall ()
#13 0x0000000000404694 in main ()
hughperkins commented 8 years ago

The fact that it is cpu-bound is an interesting observation. Seems I am too, using both cuda and opencl. It kind of makes sense, since the kernel launches are so tiny, that is, the data sent to each kernel launch is so tiny, that presumably a lot of time is spent inside the drivers, preparing the launches.

bturetzky commented 8 years ago

Yeah, I realized it wasn't what I thought it was and deleted the comment

bturetzky commented 8 years ago

Not sure I'm doing the SpatialConvolutionMM tests right. I'm not getting a stack trace on exit (probably because of all the threading?):

[New Thread 0x7fffe8756700 (LWP 13185)]
[New Thread 0x7fffe7f55700 (LWP 13186)]
[New Thread 0x7fffe7754700 (LWP 13187)]
Using NVIDIA Corporation , OpenCL platform: NVIDIA CUDA
Using OpenCL device: GeForce 8400GS
[New Thread 0x7fffe4e53700 (LWP 13188)]
[New Thread 0x7fffdf9ff700 (LWP 13189)]
[New Thread 0x7fffdf1fe700 (LWP 13190)]
[New Thread 0x7fffde9fd700 (LWP 13191)]
[New Thread 0x7fffde1fc700 (LWP 13192)]
[New Thread 0x7fffdd9fb700 (LWP 13193)]
/home/bowen/torch/install/bin/luajit: bad argument #2 to '?' (number expected, got userdata)
stack traceback:
        [C]: at 0x7ffff67269e0
        [C]: in function '__sub'
        testSpatialConvolutionMM.lua:53: in function 'SpatialConvolutionMM_forward_single'
        testSpatialConvolutionMM.lua:312: in main chunk
        [C]: at 0x00405ea0
[Thread 0x7fffdd9fb700 (LWP 13193) exited]
[Thread 0x7fffde1fc700 (LWP 13192) exited]
[Thread 0x7fffde9fd700 (LWP 13191) exited]
[Thread 0x7fffdf1fe700 (LWP 13190) exited]
[Thread 0x7fffdf9ff700 (LWP 13189) exited]
[Thread 0x7fffe4e53700 (LWP 13188) exited]
[Thread 0x7fffe7754700 (LWP 13187) exited]
[Thread 0x7fffe8756700 (LWP 13185) exited]
[Thread 0x7ffff7fde740 (LWP 13178) exited]
[Inferior 1 (process 13178) exited with code 01]
hughperkins commented 8 years ago

Ah, because it's not throwing an exception. It's triggering a call to THError, which simply exits. What I would do to go further on this probably would be to edit ~/torch/install/share/lua/5.1/testSpatialConvolutionMM.lua, line 53 or 312, to print out some more information on what is happening.

bturetzky commented 8 years ago

At a rnn_size of 220 I hit the GPU's limit on memory (254/255 MB) and above that I get memory allocation errors. Unfortunately, at this size its still about 1s slower per batch than running on the CPU.

I'm curious, and unfamiliar with the logistics of doing math on a GPU, what is the driver doing when you say "preparing the launches". I would've thought since the data is all there its just "hey go do math on that data" but it sounds like its lining up the math operations like horses into racing stalls.

bturetzky commented 8 years ago

Yeah, I can look at that some more tomorrow. I'm headed to bed -- which is where I thought you were going ;)

hughperkins commented 8 years ago

I would've thought since the data is all there its just "hey go do math on that data" but it sounds like its lining up the math operations like horses into racing stalls.

Well... the gpu can do a single operation on a massive vector of 1500 floats in one go (simpliflying a bit here...). Each of the kernel launches in char-rnn only is for about 6200 floats at a time, so that's 4 operations (5 really, since 6200/1500 is slightly more than 4). So, if it's taking 10-100 cpu operations to set up that kernel launch, then the cpu operations are going to dominate (I'm a bit plucking numbers out of the air here, but intuitively it's kind of right). In practice, if you can send at least 100 thousand numbers or so into each kernel launch, then the gpu time is going to dominate. But 6,200 numbers is a bit on the low side...

Yeah, I can look at that some more tomorrow. I'm headed to bed -- which is where I thought you were going ;)

haha, yeah, my sleep patterns are all over the place :-D

bturetzky commented 8 years ago

The failures with SpatialConvolutionMM appear to be due to a dirty test env, probably from one of the other failures, they pass on their own:

luajit -l clnn -e "clnn.test{'SpatialConvolutionMM_forward_single','SpatialConvolutionMM_forward_single_vgglayer13','SpatialConvolutionMM_forward_single_padded','SpatialConvolutionMM_forward_batch','SpatialConvolutionMM_backward_single','SpatialConvolutionMM_backward_batch'}"
libthclnn_searchpath    /home/bowen/torch/install/lib/lua/5.1/libTHCLNN.so
Running 6 tests
|_____  ==> SpatialConvolutionMM_backward_batchUsing NVIDIA Corporation , OpenCL platform: NVIDIA CUDA
Using OpenCL device: GeForce 8400GS
______  ==> Done

Completed 10 asserts in 6 tests with 0 errors

--------------------------------------------------------------------------------
hughperkins commented 8 years ago

The failures with SpatialConvolutionMM appear to be due to a dirty test env, probably from one of the other failures, they pass on their own

Ah, good information :-) Would be useful to know which particular test is triggering the other failures. Is it the one that runs out of memory?