When going through evaluation, the code crashes after the second batch. I get the following error:
imi@imi-All-Series:~/graulef/deep-soli$ th net/main.lua --file ../datapre --list config/file_half.json --load ../uni_image_np_50.t7 --inputsize 32 --inputch 4 --label 11 --datasize 32 --datach 4 --batch 16 --maxseq 40 --cuda --cudnn
[eval] data with 1364 seq
[net] loading model ../uni_image_np_50.t7
nn.Sequencer @ nn.Recursor @ nn.MaskZero @ nn.Sequential {
[input -> (1) -> (2) -> (3) -> (4) -> (5) -> (6) -> (7) -> (8) -> (9) -> (10) -> (11) -> (12) -> (13) -> (14) -> (15) -> (16) -> (17) -> (18) -> (19) -> (20) -> (21) -> output]
(1): cudnn.SpatialConvolution(4 -> 32, 3x3, 2,2)
(2): nn.SpatialBatchNormalization (4D) (32)
(3): cudnn.ReLU
(4): cudnn.SpatialConvolution(32 -> 64, 3x3, 2,2)
(5): nn.SpatialBatchNormalization (4D) (64)
(6): cudnn.ReLU
(7): nn.SpatialDropout(0.400000)
(8): cudnn.SpatialConvolution(64 -> 128, 3x3, 2,2)
(9): nn.SpatialBatchNormalization (4D) (128)
(10): cudnn.ReLU
(11): nn.SpatialDropout(0.400000)
(12): nn.Reshape(1152)
(13): nn.Linear(1152 -> 512)
(14): nn.BatchNormalization (2D) (512)
(15): cudnn.ReLU
(16): nn.Dropout(0.5, busy)
(17): nn.Linear(512 -> 512)
(18): nn.LSTM(512 -> 512)
(19): nn.Dropout(0.5, busy)
(20): nn.Linear(512 -> 13)
(21): cudnn.LogSoftMax
}
/home/imi/graulef/datapre/0_12_20/label.json
/home/imi/graulef/datapre/0_10_18/label.json
/home/imi/graulef/datapre/0_6_8/label.json
/home/imi/graulef/datapre/0_3_0/label.json
/home/imi/graulef/datapre/0_12_22/label.json
/home/imi/graulef/datapre/0_13_10/label.json
/home/imi/graulef/datapre/0_12_3/label.json
/home/imi/graulef/datapre/0_5_14/label.json
/home/imi/graulef/datapre/0_10_15/label.json
/home/imi/graulef/datapre/0_2_5/label.json
/home/imi/graulef/datapre/0_3_5/label.json
/home/imi/graulef/datapre/0_3_18/label.json
/home/imi/graulef/datapre/0_12_21/label.json
/home/imi/graulef/datapre/0_5_21/label.json
/home/imi/graulef/datapre/0_10_20/label.json
/home/imi/graulef/datapre/0_13_6/label.json
Evaluation passed
/home/imi/graulef/datapre/0_8_22/label.json
/home/imi/graulef/datapre/0_8_11/label.json
/home/imi/graulef/datapre/0_9_0/label.json
/home/imi/graulef/datapre/0_3_7/label.json
/home/imi/graulef/datapre/0_8_8/label.json
/home/imi/graulef/datapre/0_5_10/label.json
/home/imi/graulef/datapre/0_10_6/label.json
/home/imi/graulef/datapre/0_9_20/label.json
/home/imi/graulef/datapre/0_6_20/label.json
/home/imi/graulef/datapre/0_13_15/label.json
/home/imi/graulef/datapre/0_6_2/label.json
/home/imi/graulef/datapre/0_9_13/label.json
/home/imi/graulef/datapre/0_13_17/label.json
/home/imi/graulef/datapre/0_12_23/label.json
/home/imi/graulef/datapre/0_2_4/label.json
/home/imi/graulef/datapre/0_12_10/label.json
Evaluation passed
/home/imi/torch/install/bin/luajit: /home/imi/torch/install/share/lua/5.1/nn/Container.lua:67:
In 1 module of nn.Sequential:
/home/imi/torch/install/share/lua/5.1/cudnn/init.lua:91: attempt to index a nil value
stack traceback:
/home/imi/torch/install/share/lua/5.1/cudnn/init.lua:91: in function 'scalar'
...torch/install/share/lua/5.1/cudnn/SpatialConvolution.lua:195: in function <...torch/install/share/lua/5.1/cudnn/SpatialConvolution.lua:186>
[C]: in function 'xpcall'
/home/imi/torch/install/share/lua/5.1/nn/Container.lua:63: in function 'rethrowErrors'
/home/imi/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'updateOutput'
/home/imi/torch/install/share/lua/5.1/rnn/MaskZero.lua:97: in function 'updateOutput'
/home/imi/torch/install/share/lua/5.1/rnn/Recursor.lua:27: in function 'updateOutput'
/home/imi/torch/install/share/lua/5.1/rnn/Sequencer.lua:94: in function 'forward'
./net/rnntrain.lua:31: in function 'batchEval'
./net/train.lua:25: in function 'epochEval'
./net/train.lua:47: in function 'train'
net/main.lua:45: in main chunk
[C]: in function 'dofile'
.../imi/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
[C]: at 0x004065d0
WARNING: If you see a stack trace below, it doesn't point to the place where this error occurred. Please use only the one above.
stack traceback:
[C]: in function 'error'
/home/imi/torch/install/share/lua/5.1/nn/Container.lua:67: in function 'rethrowErrors'
/home/imi/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'updateOutput'
/home/imi/torch/install/share/lua/5.1/rnn/MaskZero.lua:97: in function 'updateOutput'
/home/imi/torch/install/share/lua/5.1/rnn/Recursor.lua:27: in function 'updateOutput'
/home/imi/torch/install/share/lua/5.1/rnn/Sequencer.lua:94: in function 'forward'
./net/rnntrain.lua:31: in function 'batchEval'
./net/train.lua:25: in function 'epochEval'
./net/train.lua:47: in function 'train'
net/main.lua:45: in main chunk
[C]: in function 'dofile'
.../imi/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
[C]: at 0x004065d0
This persists if the batch size is altered to let's say 4:
imi@imi-All-Series:~/graulef/deep-soli$ th net/main.lua --file ../datapre --list config/file_half.json --load ../uni_image_np_50.t7 --inputsize 32 --inputch 4 --label 11 --datasize 32 --datach 4 --batch 4 --maxseq 40 --cuda --cudnn
[eval] data with 1364 seq
[net] loading model ../uni_image_np_50.t7
nn.Sequencer @ nn.Recursor @ nn.MaskZero @ nn.Sequential {
[input -> (1) -> (2) -> (3) -> (4) -> (5) -> (6) -> (7) -> (8) -> (9) -> (10) -> (11) -> (12) -> (13) -> (14) -> (15) -> (16) -> (17) -> (18) -> (19) -> (20) -> (21) -> output]
(1): cudnn.SpatialConvolution(4 -> 32, 3x3, 2,2)
(2): nn.SpatialBatchNormalization (4D) (32)
(3): cudnn.ReLU
(4): cudnn.SpatialConvolution(32 -> 64, 3x3, 2,2)
(5): nn.SpatialBatchNormalization (4D) (64)
(6): cudnn.ReLU
(7): nn.SpatialDropout(0.400000)
(8): cudnn.SpatialConvolution(64 -> 128, 3x3, 2,2)
(9): nn.SpatialBatchNormalization (4D) (128)
(10): cudnn.ReLU
(11): nn.SpatialDropout(0.400000)
(12): nn.Reshape(1152)
(13): nn.Linear(1152 -> 512)
(14): nn.BatchNormalization (2D) (512)
(15): cudnn.ReLU
(16): nn.Dropout(0.5, busy)
(17): nn.Linear(512 -> 512)
(18): nn.LSTM(512 -> 512)
(19): nn.Dropout(0.5, busy)
(20): nn.Linear(512 -> 13)
(21): cudnn.LogSoftMax
}
/home/imi/graulef/datapre/0_12_20/label.json
/home/imi/graulef/datapre/0_10_18/label.json
/home/imi/graulef/datapre/0_6_8/label.json
/home/imi/graulef/datapre/0_3_0/label.json
Evaluation passed
/home/imi/graulef/datapre/0_12_22/label.json
/home/imi/graulef/datapre/0_13_10/label.json
/home/imi/graulef/datapre/0_12_3/label.json
/home/imi/graulef/datapre/0_5_14/label.json
Evaluation passed
/home/imi/torch/install/bin/luajit: /home/imi/torch/install/share/lua/5.1/nn/Container.lua:67:
In 1 module of nn.Sequential:
/home/imi/torch/install/share/lua/5.1/cudnn/init.lua:91: attempt to index a nil value
stack traceback:
/home/imi/torch/install/share/lua/5.1/cudnn/init.lua:91: in function 'scalar'
...torch/install/share/lua/5.1/cudnn/SpatialConvolution.lua:195: in function <...torch/install/share/lua/5.1/cudnn/SpatialConvolution.lua:186>
[C]: in function 'xpcall'
/home/imi/torch/install/share/lua/5.1/nn/Container.lua:63: in function 'rethrowErrors'
/home/imi/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'updateOutput'
/home/imi/torch/install/share/lua/5.1/rnn/MaskZero.lua:97: in function 'updateOutput'
/home/imi/torch/install/share/lua/5.1/rnn/Recursor.lua:27: in function 'updateOutput'
/home/imi/torch/install/share/lua/5.1/rnn/Sequencer.lua:94: in function 'forward'
./net/rnntrain.lua:31: in function 'batchEval'
./net/train.lua:25: in function 'epochEval'
./net/train.lua:47: in function 'train'
net/main.lua:45: in main chunk
[C]: in function 'dofile'
.../imi/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
[C]: at 0x004065d0
WARNING: If you see a stack trace below, it doesn't point to the place where this error occurred. Please use only the one above.
stack traceback:
[C]: in function 'error'
/home/imi/torch/install/share/lua/5.1/nn/Container.lua:67: in function 'rethrowErrors'
/home/imi/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'updateOutput'
/home/imi/torch/install/share/lua/5.1/rnn/MaskZero.lua:97: in function 'updateOutput'
/home/imi/torch/install/share/lua/5.1/rnn/Recursor.lua:27: in function 'updateOutput'
/home/imi/torch/install/share/lua/5.1/rnn/Sequencer.lua:94: in function 'forward'
./net/rnntrain.lua:31: in function 'batchEval'
./net/train.lua:25: in function 'epochEval'
./net/train.lua:47: in function 'train'
net/main.lua:45: in main chunk
[C]: in function 'dofile'
.../imi/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
[C]: at 0x004065d0
The segment of code that causes the issue is again in MaskZero.lua, which is weird. The line that causes the error is line 70 in rnn/MaskZero.lua (https://github.com/Element-Research/rnn/blob/master/MaskZero.lua). My line number differs due to comments. As the error is caused by accessing a nil value and only occurs after the second iteration, I think it's some sort of memory issue.
Has anyone had a similar issue or can reproduce it? If not, what versions of the packages were you running?
I could fix this issue by setting the boolean useCuda to true manually. Somehow, the argument was not properly passed from main to the RnnTrain constructor. Still finding out why...
When going through evaluation, the code crashes after the second batch. I get the following error:
This persists if the batch size is altered to let's say 4:
The segment of code that causes the issue is again in MaskZero.lua, which is weird. The line that causes the error is line 70 in rnn/MaskZero.lua (https://github.com/Element-Research/rnn/blob/master/MaskZero.lua). My line number differs due to comments. As the error is caused by accessing a
nil
value and only occurs after the second iteration, I think it's some sort of memory issue.Has anyone had a similar issue or can reproduce it? If not, what versions of the packages were you running?