The argument --useCuda in the given command prompt seems not to propagate through the whole program. This leads to a mixed usage of Cuda and normal Float tensors, which lead to the following error in my case:
imi@imi-All-Series:~/graulef/deep-soli$ th net/main.lua --file ../datapre --list config/file_half.json --load ../uni_image_np_50.t7 --inputsize 32 --inputch 4 --label 13 --datasize 32 --datach 4 --batch 16 --maxseq 40 --cuda --cudnn
Cuda enabled
[eval] data with 1364 seq
[net] loading model ../uni_image_np_50.t7
nn.Sequencer @ nn.Recursor @ nn.MaskZero @ nn.Sequential {
[input -> (1) -> (2) -> (3) -> (4) -> (5) -> (6) -> (7) -> (8) -> (9) -> (10) -> (11) -> (12) -> (13) -> (14) -> (15) -> (16) -> (17) -> (18) -> (19) -> (20) -> (21) -> output]
(1): cudnn.SpatialConvolution(4 -> 32, 3x3, 2,2)
(2): nn.SpatialBatchNormalization (4D) (32)
(3): cudnn.ReLU
(4): cudnn.SpatialConvolution(32 -> 64, 3x3, 2,2)
(5): nn.SpatialBatchNormalization (4D) (64)
(6): cudnn.ReLU
(7): nn.SpatialDropout(0.400000)
(8): cudnn.SpatialConvolution(64 -> 128, 3x3, 2,2)
(9): nn.SpatialBatchNormalization (4D) (128)
(10): cudnn.ReLU
(11): nn.SpatialDropout(0.400000)
(12): nn.Reshape(1152)
(13): nn.Linear(1152 -> 512)
(14): nn.BatchNormalization (2D) (512)
(15): cudnn.ReLU
(16): nn.Dropout(0.5, busy)
(17): nn.Linear(512 -> 512)
(18): nn.LSTM(512 -> 512)
(19): nn.Dropout(0.5, busy)
(20): nn.Linear(512 -> 13)
(21): cudnn.LogSoftMax
}
/home/imi/torch/install/bin/luajit: /home/imi/torch/install/share/lua/5.1/nn/Container.lua:67:
In 1 module of nn.Sequential:
/home/imi/torch/install/share/lua/5.1/cudnn/init.lua:92: attempt to index a nil value
stack traceback:
/home/imi/torch/install/share/lua/5.1/cudnn/init.lua:92: in function 'scalar'
...torch/install/share/lua/5.1/cudnn/SpatialConvolution.lua:195: in function <...torch/install/share/lua/5.1/cudnn/SpatialConvolution.lua:186>
[C]: in function 'xpcall'
/home/imi/torch/install/share/lua/5.1/nn/Container.lua:63: in function 'rethrowErrors'
/home/imi/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'updateOutput'
/home/imi/torch/install/share/lua/5.1/rnn/MaskZero.lua:94: in function 'updateOutput'
/home/imi/torch/install/share/lua/5.1/rnn/Recursor.lua:27: in function 'updateOutput'
/home/imi/torch/install/share/lua/5.1/rnn/Sequencer.lua:94: in function 'forward'
./net/rnntrain.lua:34: in function 'batchEval'
./net/train.lua:25: in function 'epochEval'
./net/train.lua:47: in function 'train'
net/main.lua:47: in main chunk
[C]: in function 'dofile'
.../imi/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
[C]: at 0x004065d0
WARNING: If you see a stack trace below, it doesn't point to the place where this error occurred. Please use only the one above.
stack traceback:
[C]: in function 'error'
/home/imi/torch/install/share/lua/5.1/nn/Container.lua:67: in function 'rethrowErrors'
/home/imi/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'updateOutput'
/home/imi/torch/install/share/lua/5.1/rnn/MaskZero.lua:94: in function 'updateOutput'
/home/imi/torch/install/share/lua/5.1/rnn/Recursor.lua:27: in function 'updateOutput'
/home/imi/torch/install/share/lua/5.1/rnn/Sequencer.lua:94: in function 'forward'
./net/rnntrain.lua:34: in function 'batchEval'
./net/train.lua:25: in function 'epochEval'
./net/train.lua:47: in function 'train'
net/main.lua:47: in main chunk
[C]: in function 'dofile'
.../imi/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
[C]: at 0x004065d0
I fixed this by simply adding this line of code in the constructor of RnnTrain:
useCuda = true
This is a dirty fix, but it worked for me. I will try to find out where the actual mistake is in the code.
The argument --useCuda in the given command prompt seems not to propagate through the whole program. This leads to a mixed usage of Cuda and normal Float tensors, which lead to the following error in my case:
I fixed this by simply adding this line of code in the constructor of RnnTrain:
useCuda = true
This is a dirty fix, but it worked for me. I will try to find out where the actual mistake is in the code.