jcjohnson / neural-style

Torch implementation of neural style algorithm
MIT License
18.31k stars 2.7k forks source link

Memory object allocation failure #404

Open AlexNarbut opened 7 years ago

AlexNarbut commented 7 years ago

I train on my laptop with Geforce 940m 8gb RAM

th neural_style.lua -style_image examples/inputs/picasso_selfport1907.jpg -content_image examples/inputs/brad_pitt.jpg -output_image profile.png -gpu 0 -backend clnn -print_iter 1 -num_iterations 100 -seed 123 -content_layers relu0,relu3,relu7,relu12 -style_layers relu0,relu3,relu7,relu12 -content_weight 10 -style_weight 300 -image_size 1000 -optimizer adam

libthclnn_searchpath /home/alex/torch-cl/install/lib/lua/5.1/libTHCLNN.so
Using NVIDIA Corporation , OpenCL platform: NVIDIA CUDA Using OpenCL device: GeForce 940M [libprotobuf WARNING google/protobuf/io/coded_stream.cc:537] Reading dangerously large protocol message. If the message turns out to be larger than 1073741824 bytes, parsing will be halted for security reasons. To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h. [libprotobuf WARNING google/protobuf/io/coded_stream.cc:78] The total number of bytes read was 574671192 Successfully loaded models/VGG_ILSVRC_19_layers.caffemodel conv1_1: 64 3 3 3 conv1_2: 64 64 3 3 conv2_1: 128 64 3 3 conv2_2: 128 128 3 3 conv3_1: 256 128 3 3 conv3_2: 256 256 3 3 conv3_3: 256 256 3 3 conv3_4: 256 256 3 3 conv4_1: 512 256 3 3 conv4_2: 512 512 3 3 conv4_3: 512 512 3 3 conv4_4: 512 512 3 3 conv5_1: 512 512 3 3 conv5_2: 512 512 3 3 conv5_3: 512 512 3 3 conv5_4: 512 512 3 3 fc6: 1 1 25088 4096 fc7: 1 1 4096 4096 fc8: 1 1 4096 1000 Capturing content targets
nn.Sequential { [input -> (1) -> (2) -> (3) -> (4) -> (5) -> (6) -> (7) -> (8) -> (9) -> (10) -> (11) -> (12) -> (13) -> (14) -> (15) -> (16) -> (17) -> (18) -> (19) -> (20) -> (21) -> (22) -> (23) -> (24) -> (25) -> (26) -> (27) -> (28) -> (29) -> (30) -> (31) -> (32) -> (33) -> (34) -> (35) -> (36) -> (37) -> (38) -> (39) -> (40) -> (41) -> (42) -> (43) -> (44) -> (45) -> (46) -> (47) -> output] (1): nn.TVLoss (2): nn.SpatialConvolution(3 -> 64, 3x3, 1,1, 1,1) (3): nn.ReLU (4): nn.SpatialConvolution(64 -> 64, 3x3, 1,1, 1,1) (5): nn.ReLU (6): nn.SpatialMaxPooling(2,2,2,2) (7): nn.SpatialConvolution(64 -> 128, 3x3, 1,1, 1,1) (8): nn.ReLU (9): nn.SpatialConvolution(128 -> 128, 3x3, 1,1, 1,1) (10): nn.ReLU (11): nn.SpatialMaxPooling(2,2,2,2) (12): nn.SpatialConvolution(128 -> 256, 3x3, 1,1, 1,1) (13): nn.ReLU (14): nn.SpatialConvolution(256 -> 256, 3x3, 1,1, 1,1) (15): nn.ReLU (16): nn.SpatialConvolution(256 -> 256, 3x3, 1,1, 1,1) (17): nn.ReLU (18): nn.SpatialConvolution(256 -> 256, 3x3, 1,1, 1,1) (19): nn.ReLU (20): nn.SpatialMaxPooling(2,2,2,2) (21): nn.SpatialConvolution(256 -> 512, 3x3, 1,1, 1,1) (22): nn.ReLU (23): nn.SpatialConvolution(512 -> 512, 3x3, 1,1, 1,1) (24): nn.ReLU (25): nn.SpatialConvolution(512 -> 512, 3x3, 1,1, 1,1) (26): nn.ReLU (27): nn.SpatialConvolution(512 -> 512, 3x3, 1,1, 1,1) (28): nn.ReLU (29): nn.SpatialMaxPooling(2,2,2,2) (30): nn.SpatialConvolution(512 -> 512, 3x3, 1,1, 1,1) (31): nn.ReLU (32): nn.SpatialConvolution(512 -> 512, 3x3, 1,1, 1,1) (33): nn.ReLU (34): nn.SpatialConvolution(512 -> 512, 3x3, 1,1, 1,1) (35): nn.ReLU (36): nn.SpatialConvolution(512 -> 512, 3x3, 1,1, 1,1) (37): nn.ReLU (38): nn.SpatialMaxPooling(2,2,2,2) (39): nn.View(-1) (40): nn.Linear(25088 -> 4096) (41): nn.ReLU (42): nn.Dropout(0.500000) (43): nn.Linear(4096 -> 4096) (44): nn.ReLU (45): nn.Dropout(0.500000) (46): nn.Linear(4096 -> 1000) (47): nn.SoftMax }

kernel source: 1: // from im2col.h: 2: 3: // CL: grid stride looping 4: #define CL_KERNEL_LOOP(i, n) \ 5: for (int i = get_group_id(0) get_local_size(0) + get_local_id(0); \ 6: i < (n); \ 7: i += get_local_size(0) get_num_groups(0)) 8: 9: // Kernel for fast unfold+copy 10: // (borrowed from Caffe: https://github.com/BVLC/caffe/blob/master/src/caffe/layers/conv_layer.cu) 11: kernel void im2col_kernel(const int n, const global float im_data, int im_offset, 12: const int height, const int width, const int ksize_h, const int ksize_w, const int pad_h, 13: const int pad_w, const int stride_h, const int stride_w, const int height_col, const int width_col, 14: global float col_data, int col_offset) { 15: global const float data_im = im_data + im_offset; 16: global float data_col = col_data + col_offset; 17: 18: CL_KERNEL_LOOP(index, n) { 19: int w_out = index % width_col; 20: index /= width_col; 21: int h_out = index % height_col; 22: int channel_in = index / height_col; 23: int channel_out = channel_in ksize_h ksize_w; 24: int h_in = h_out stride_h - pad_h; 25: int w_in = w_out stride_w - pad_w; 26: data_col += (channel_out height_col + h_out) width_col + w_out; 27: data_im += (channel_in height + h_in) width + w_in; 28: for (int i = 0; i < ksize_h; ++i) { 29: for (int j = 0; j < ksize_w; ++j) { 30: int h = h_in + i; 31: int w = w_in + j; 32: data_col = (h >= 0 && w >= 0 && h < height && w < width) ? 33: data_im[i width + j] : 0; 34: data_col += height_col width_col; 35: } 36: } 37: } 38: } 39: 40: kernel void col2im_kernel(const int n, global const float col_data, int col_offset, 41: const int height, const int width, const int channels, const int patch_h, const int patch_w, 42: const int pad_h, const int pad_w, const int stride_h, const int stride_w, 43: const int height_col, const int width_col, 44: global float im_data, int im_offset) { 45: global float data_im = im_data + im_offset; 46: global const float data_col = col_data + col_offset; 47: 48: CL_KERNEL_LOOP(index, n) { 49: float val = 0; 50: int w = index % width + pad_w; 51: int h = (index / width) % height + pad_h; 52: int c = index / (width height); 53: // compute the start and end of the output 54: int w_col_start = (w < patch_w) ? 0 : (w - patch_w) / stride_w + 1; 55: int w_col_end = min(w / stride_w + 1, width_col); 56: int h_col_start = (h < patch_h) ? 0 : (h - patch_h) / stride_h + 1; 57: int h_col_end = min(h / stride_h + 1, height_col); 58: / 59: for (int h_col = h_col_start; h_col < h_col_end; ++h_col) { 60: for (int w_col = w_col_start; w_col < w_col_end; ++w_col) { 61: // the col location: [c width height + h_out, w_out] 62: int c_col = c patch_h patch_w + (h - h_col stride_h) ksize + (w - w_col stride_w); 63: val += data_col[(c_col height_col + h_col) width_col + w_col]; 64: } 65: } 66: / 67: // equivalent implementation 68: int offset = (c patch_h patch_w + h patch_w + w) height_col width_col; 69: int coeff_h_col = (1 - stride_h patch_w height_col) width_col; 70: int coeff_w_col = (1 - stride_w height_col width_col); 71: for (int h_col = h_col_start; h_col < h_col_end; ++h_col) { 72: for (int w_col = w_col_start; w_col < w_col_end; ++w_col) { 73: val += data_col[offset + h_col coeff_h_col + w_col * coeff_w_col]; 74: } 75: } 76: data_im[index] = val; 77: } 78: } 79: 80:

Memory object allocation failure, code -4 /home/alex/torch-cl/install/bin/luajit: /home/alex/torch-cl/install/share/lua/5.1/nn/THNN.lua:806: kernel source: 1: // from im2col.h: 2: 3: // CL: grid stride looping 4: #define CL_KERNEL_LOOP(i, n) \ 5: for (int i = get_group_id(0) get_local_size(0) + get_local_id(0); \ 6: i < (n); \ 7: i += get_local_size(0) get_num_groups(0)) 8: 9: // Kernel for fast unfold+copy 10: // (borrowed from Caffe: https://github.com/BVLC/caffe/blob/master/src/caffe/layers/conv_layer.cu) 11: kernel void im2col_kernel(const int n, const global float im_data, int im_offset, 12: const int height, const int width, const int ksize_h, const int ksize_w, const int pad_h, 13: const int pad_w, const int stride_h, const int stride_w, const int height_col, const int width_col, 14: global float col_data, int col_offset) { 15: global const float data_im = im_data + im_offset; 16: global float data_col = col_data + col_offset; 17: 18: CL_KERNEL_LOOP(index, n) { 19: int w_out = index ? stack traceback: [C]: in function 'v' /home/alex/torch-cl/install/share/lua/5.1/nn/THNN.lua:806: in function 'SpatialConvolutionMM_updateOutput' ...torch-cl/install/share/lua/5.1/nn/SpatialConvolution.lua:100: in function 'updateOutput' /home/alex/torch-cl/install/share/lua/5.1/nn/Sequential.lua:44: in function 'forward' neural_style.lua:162: in function 'main' neural_style.lua:601: in main chunk [C]: in function 'dofile' ...x/torch-cl/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk [C]: at 0x00405e90

AlexNarbut commented 7 years ago

echo $PATH /home/alex/torch-cl/install/bin:/usr/local/cuda-8.0/bin:/home/alex/torch/install/bin:/home/alex/torch-cl/install/bin:/home/alex/torch/install/bin:/home/alex/bin:/home/alex/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin echo $LD_LIBRARY_PATH /home/alex/torch-cl/install/lib:/usr/local/cuda-8.0/lib64:/home/alex/torch/install/lib:/home/alex/torch-cl/install/lib:/home/alex/torch/install/lib:

jcjohnson commented 7 years ago

You are probably running out of GPU memory; 8GB is not enough for image size 1000.

atducey commented 7 years ago

Do you have ballpark figures for how much GRAM corresponds to various size images? I'm guessing it increases quadratically, such that 1024 takes 4 times as much RAM as 512, but I don't really know how much 512 takes.