nicholas-leonard / dp

A deep learning library for streamlining research and development using the Torch7 distribution.
Other
343 stars 139 forks source link

cuda runtime error (77) : an illegal memory access #139

Closed deepakjnath closed 9 years ago

deepakjnath commented 9 years ago

I find that am getting illegal memory access for dp.cnn example with dataset that is greater than 50x3x256x256. I am able to train the network as long as data set doesn't exceed this value.

Does anyone know what could be the problem?

Thank you, Deepak

th dj-cnn.lua --dataset djData --cuda --kernelSize "{1,1,1,1}" --channelSize '{3,10}' --trainData 'train_data1000.t7' --trainLabel 'train_label1000.t7'

{ accUpdate : false activation : "Tanh" batchNorm : false batchSize : 64 channelSize : "{3,10}" cuda : true dataset : "djData" dropout : false dropoutProb : "{0.2,0.5,0.5}" hiddenSize : "{}" kernelSize : "{1,1,1,1}" kernelStride : "{1,1,1,1}" learningRate : 0.1 lecunlcn : false loadSize : "" maxEpoch : 100 maxOutNorm : 1 maxTries : 30 metaPath : "." momentum : 0 padding : false poolSize : "{2,2,2,2}" poolStride : "{2,2,2,2}" progress : false sampleSize : "." silent : false standardize : false trainData : "train_data1000.t7" trainLabel : "train_label1000.t7" trainPath : "." useDevice : 1 validPath : "." zca : false }
input to dense layers has: 40960 neurons
Model:
nn.Sequential { input -> (1) -> (2) -> (3) -> (4) -> (5) -> (6) -> (7) -> (8) -> (9) -> (10) -> output: nn.Convert (2): nn.SpatialConvolution(3 -> 3, 1x1) (3): nn.Tanh (4): nn.SpatialMaxPooling(2,2,2,2) (5): nn.SpatialConvolution(3 -> 10, 1x1) (6): nn.Tanh (7): nn.SpatialMaxPooling(2,2,2,2) (8): nn.Collapse (9): nn.Linear(40960 -> 5) (10): nn.LogSoftMax } FileLogger: log will be written to /home/deepakjnath/save/seraphim:1435216930:1/log ==> epoch # 1 for optimizer :
/home/deepakjnath/torch/install/bin/luajit: /tmp/luarocks_cutorch-scm-1-3394/cutorch/lib/THC/THCStorage.c(15) : cuda runtime error (77) : an illegal memory access was encountered at /tmp/luarocks_cutorch-scm-1-3394/cutorch/lib/THC/THCGeneral.c:241 stack traceback: [C]: at 0x7ff7edc2e340 [C]: in function '__index' ...ath/torch/install/share/lua/5.1/nn/ClassNLLCriterion.lua:36: in function 'forward' /usr/local/share/lua/5.1/dpnn/ModuleCriterion.lua:23: in function 'forward' /usr/local/share/lua/5.1/dp/propagator/propagator.lua:158: in function 'forward' /usr/local/share/lua/5.1/dp/propagator/optimizer.lua:50: in function 'propagateBatch' /usr/local/share/lua/5.1/dp/propagator/propagator.lua:117: in function 'propagateEpoch' /usr/local/share/lua/5.1/dp/propagator/experiment.lua:110: in function 'run' dj-cnn.lua:271: in main chunk [C]: in function 'dofile' ...nath/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:131: in main chunk [C]: at 0x00406670 /home/deepakjnath>

nicholas-leonard commented 9 years ago

The input to your dense layer is too big. Maybe try adding a couple more convolution + max pooling layers.

deepakjnath commented 9 years ago

Thanks - This helped!