Closed SeekPoint closed 8 years ago
I got 32g cpu ram and 2 gpu (gtx1080 8G) on my machine. why it cannot afford 15G memory?
rzai@rzai00:~/prj/HieCoAttenVQA/prepro$ CUDA_VISIBLE_DEVICES=1 th prepro_img_vgg.lua -input_json ../data/vqa_data_prepro.json -image_root /home/rzai/mscoco.org-visualqa.org/ -cnn_proto /home/rzai/VGG_ILSVRC_19_layers_deploy.prototxt -cnn_model /home/rzai/VGG_ILSVRC_19_layers.caffemodel { batch_size : 20 gpuid : 6 out_name_train : "../data/vqa_data_img_vgg_train.h5" out_name_test : "../data/vqa_data_img_vgg_test.h5" cnn_proto : "/home/rzai/VGG_ILSVRC_19_layers_deploy.prototxt" cnn_model : "/home/rzai/VGG_ILSVRC_19_layers.caffemodel" backend : "cudnn" image_root : "/home/rzai/mscoco.org-visualqa.org/" input_json : "../data/vqa_data_prepro.json" } [libprotobuf WARNING google/protobuf/io/coded_stream.cc:505] Reading dangerously large protocol message. If the message turns out to be larger than 1073741824 bytes, parsing will be halted for security reasons. To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h. [libprotobuf WARNING google/protobuf/io/coded_stream.cc:78] The total number of bytes read was 574671192 Successfully loaded /home/rzai/VGG_ILSVRC_19_layers.caffemodel conv1_1: 64 3 3 3 conv1_2: 64 64 3 3 conv2_1: 128 64 3 3 conv2_2: 128 128 3 3 conv3_1: 256 128 3 3 conv3_2: 256 256 3 3 conv3_3: 256 256 3 3 conv3_4: 256 256 3 3 conv4_1: 512 256 3 3 conv4_2: 512 512 3 3 conv4_3: 512 512 3 3 conv4_4: 512 512 3 3 conv5_1: 512 512 3 3 conv5_2: 512 512 3 3 conv5_3: 512 512 3 3 conv5_4: 512 512 3 3 fc6: 1 1 25088 4096 fc7: 1 1 4096 4096 fc8: 1 1 4096 1000 nn.Sequential { [input -> (1) -> (2) -> (3) -> (4) -> (5) -> (6) -> (7) -> (8) -> (9) -> (10) -> (11) -> (12) -> (13) -> (14) -> (15) -> (16) -> (17) -> (18) -> (19) -> (20) -> (21) -> (22) -> (23) -> (24) -> (25) -> (26) -> (27) -> (28) -> (29) -> (30) -> (31) -> (32) -> (33) -> (34) -> (35) -> (36) -> (37) -> (38) -> (39) -> (40) -> (41) -> (42) -> (43) -> (44) -> (45) -> (46) -> output] (1): cudnn.SpatialConvolution(3 -> 64, 3x3, 1,1, 1,1) (2): cudnn.ReLU (3): cudnn.SpatialConvolution(64 -> 64, 3x3, 1,1, 1,1) (4): cudnn.ReLU (5): cudnn.SpatialMaxPooling(2x2, 2,2) (6): cudnn.SpatialConvolution(64 -> 128, 3x3, 1,1, 1,1) (7): cudnn.ReLU (8): cudnn.SpatialConvolution(128 -> 128, 3x3, 1,1, 1,1) (9): cudnn.ReLU (10): cudnn.SpatialMaxPooling(2x2, 2,2) (11): cudnn.SpatialConvolution(128 -> 256, 3x3, 1,1, 1,1) (12): cudnn.ReLU (13): cudnn.SpatialConvolution(256 -> 256, 3x3, 1,1, 1,1) (14): cudnn.ReLU (15): cudnn.SpatialConvolution(256 -> 256, 3x3, 1,1, 1,1) (16): cudnn.ReLU (17): cudnn.SpatialConvolution(256 -> 256, 3x3, 1,1, 1,1) (18): cudnn.ReLU (19): cudnn.SpatialMaxPooling(2x2, 2,2) (20): cudnn.SpatialConvolution(256 -> 512, 3x3, 1,1, 1,1) (21): cudnn.ReLU (22): cudnn.SpatialConvolution(512 -> 512, 3x3, 1,1, 1,1) (23): cudnn.ReLU (24): cudnn.SpatialConvolution(512 -> 512, 3x3, 1,1, 1,1) (25): cudnn.ReLU (26): cudnn.SpatialConvolution(512 -> 512, 3x3, 1,1, 1,1) (27): cudnn.ReLU (28): cudnn.SpatialMaxPooling(2x2, 2,2) (29): cudnn.SpatialConvolution(512 -> 512, 3x3, 1,1, 1,1) (30): cudnn.ReLU (31): cudnn.SpatialConvolution(512 -> 512, 3x3, 1,1, 1,1) (32): cudnn.ReLU (33): cudnn.SpatialConvolution(512 -> 512, 3x3, 1,1, 1,1) (34): cudnn.ReLU (35): cudnn.SpatialConvolution(512 -> 512, 3x3, 1,1, 1,1) (36): cudnn.ReLU (37): cudnn.SpatialMaxPooling(2x2, 2,2) (38): nn.View(-1) (39): nn.Linear(25088 -> 4096) (40): cudnn.ReLU (41): nn.Dropout(0.500000) (42): nn.Linear(4096 -> 4096) (43): cudnn.ReLU (44): nn.Dropout(0.500000) (45): nn.Linear(4096 -> 1000) (46): cudnn.SoftMax } nn.Sequential { [input -> (1) -> (2) -> (3) -> (4) -> (5) -> (6) -> (7) -> (8) -> (9) -> (10) -> (11) -> (12) -> (13) -> (14) -> (15) -> (16) -> (17) -> (18) -> (19) -> (20) -> (21) -> (22) -> (23) -> (24) -> (25) -> (26) -> (27) -> (28) -> (29) -> (30) -> (31) -> (32) -> (33) -> (34) -> (35) -> (36) -> (37) -> output] (1): cudnn.SpatialConvolution(3 -> 64, 3x3, 1,1, 1,1) (2): cudnn.ReLU (3): cudnn.SpatialConvolution(64 -> 64, 3x3, 1,1, 1,1) (4): cudnn.ReLU (5): cudnn.SpatialMaxPooling(2x2, 2,2) (6): cudnn.SpatialConvolution(64 -> 128, 3x3, 1,1, 1,1) (7): cudnn.ReLU (8): cudnn.SpatialConvolution(128 -> 128, 3x3, 1,1, 1,1) (9): cudnn.ReLU (10): cudnn.SpatialMaxPooling(2x2, 2,2) (11): cudnn.SpatialConvolution(128 -> 256, 3x3, 1,1, 1,1) (12): cudnn.ReLU (13): cudnn.SpatialConvolution(256 -> 256, 3x3, 1,1, 1,1) (14): cudnn.ReLU (15): cudnn.SpatialConvolution(256 -> 256, 3x3, 1,1, 1,1) (16): cudnn.ReLU (17): cudnn.SpatialConvolution(256 -> 256, 3x3, 1,1, 1,1) (18): cudnn.ReLU (19): cudnn.SpatialMaxPooling(2x2, 2,2) (20): cudnn.SpatialConvolution(256 -> 512, 3x3, 1,1, 1,1) (21): cudnn.ReLU (22): cudnn.SpatialConvolution(512 -> 512, 3x3, 1,1, 1,1) (23): cudnn.ReLU (24): cudnn.SpatialConvolution(512 -> 512, 3x3, 1,1, 1,1) (25): cudnn.ReLU (26): cudnn.SpatialConvolution(512 -> 512, 3x3, 1,1, 1,1) (27): cudnn.ReLU (28): cudnn.SpatialMaxPooling(2x2, 2,2) (29): cudnn.SpatialConvolution(512 -> 512, 3x3, 1,1, 1,1) (30): cudnn.ReLU (31): cudnn.SpatialConvolution(512 -> 512, 3x3, 1,1, 1,1) (32): cudnn.ReLU (33): cudnn.SpatialConvolution(512 -> 512, 3x3, 1,1, 1,1) (34): cudnn.ReLU (35): cudnn.SpatialConvolution(512 -> 512, 3x3, 1,1, 1,1) (36): cudnn.ReLU (37): cudnn.SpatialMaxPooling(2x2, 2,2) } processing 82460 images... /home/rzai/torch/install/bin/luajit: $ Torch: not enough memory: you tried to allocate 15GB. Buy new RAM! at /home/rzai/torch/pkg/torch/lib/TH/THGeneral.c:270 stack traceback: [C]: at 0x7f1d81308e80 [C]: in function 'FloatTensor' prepro_img_vgg.lua:120: in main chunk [C]: in function 'dofile' ...rzai/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk [C]: at 0x00406670 rzai@rzai00:~/prj/HieCoAttenVQA/prepro$ vim /home/rzai/torch/pkg/torch/lib/TH/THGeneral.c rzai@rzai00:~/prj/HieCoAttenVQA/prepro$
after kill other process, It passed with 32G memory.
I got 32g cpu ram and 2 gpu (gtx1080 8G) on my machine. why it cannot afford 15G memory?
rzai@rzai00:~/prj/HieCoAttenVQA/prepro$ CUDA_VISIBLE_DEVICES=1 th prepro_img_vgg.lua -input_json ../data/vqa_data_prepro.json -image_root /home/rzai/mscoco.org-visualqa.org/ -cnn_proto /home/rzai/VGG_ILSVRC_19_layers_deploy.prototxt -cnn_model /home/rzai/VGG_ILSVRC_19_layers.caffemodel { batch_size : 20 gpuid : 6 out_name_train : "../data/vqa_data_img_vgg_train.h5" out_name_test : "../data/vqa_data_img_vgg_test.h5" cnn_proto : "/home/rzai/VGG_ILSVRC_19_layers_deploy.prototxt" cnn_model : "/home/rzai/VGG_ILSVRC_19_layers.caffemodel" backend : "cudnn" image_root : "/home/rzai/mscoco.org-visualqa.org/" input_json : "../data/vqa_data_prepro.json" } [libprotobuf WARNING google/protobuf/io/coded_stream.cc:505] Reading dangerously large protocol message. If the message turns out to be larger than 1073741824 bytes, parsing will be halted for security reasons. To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h. [libprotobuf WARNING google/protobuf/io/coded_stream.cc:78] The total number of bytes read was 574671192 Successfully loaded /home/rzai/VGG_ILSVRC_19_layers.caffemodel conv1_1: 64 3 3 3 conv1_2: 64 64 3 3 conv2_1: 128 64 3 3 conv2_2: 128 128 3 3 conv3_1: 256 128 3 3 conv3_2: 256 256 3 3 conv3_3: 256 256 3 3 conv3_4: 256 256 3 3 conv4_1: 512 256 3 3 conv4_2: 512 512 3 3 conv4_3: 512 512 3 3 conv4_4: 512 512 3 3 conv5_1: 512 512 3 3 conv5_2: 512 512 3 3 conv5_3: 512 512 3 3 conv5_4: 512 512 3 3 fc6: 1 1 25088 4096 fc7: 1 1 4096 4096 fc8: 1 1 4096 1000 nn.Sequential { [input -> (1) -> (2) -> (3) -> (4) -> (5) -> (6) -> (7) -> (8) -> (9) -> (10) -> (11) -> (12) -> (13) -> (14) -> (15) -> (16) -> (17) -> (18) -> (19) -> (20) -> (21) -> (22) -> (23) -> (24) -> (25) -> (26) -> (27) -> (28) -> (29) -> (30) -> (31) -> (32) -> (33) -> (34) -> (35) -> (36) -> (37) -> (38) -> (39) -> (40) -> (41) -> (42) -> (43) -> (44) -> (45) -> (46) -> output] (1): cudnn.SpatialConvolution(3 -> 64, 3x3, 1,1, 1,1) (2): cudnn.ReLU (3): cudnn.SpatialConvolution(64 -> 64, 3x3, 1,1, 1,1) (4): cudnn.ReLU (5): cudnn.SpatialMaxPooling(2x2, 2,2) (6): cudnn.SpatialConvolution(64 -> 128, 3x3, 1,1, 1,1) (7): cudnn.ReLU (8): cudnn.SpatialConvolution(128 -> 128, 3x3, 1,1, 1,1) (9): cudnn.ReLU (10): cudnn.SpatialMaxPooling(2x2, 2,2) (11): cudnn.SpatialConvolution(128 -> 256, 3x3, 1,1, 1,1) (12): cudnn.ReLU (13): cudnn.SpatialConvolution(256 -> 256, 3x3, 1,1, 1,1) (14): cudnn.ReLU (15): cudnn.SpatialConvolution(256 -> 256, 3x3, 1,1, 1,1) (16): cudnn.ReLU (17): cudnn.SpatialConvolution(256 -> 256, 3x3, 1,1, 1,1) (18): cudnn.ReLU (19): cudnn.SpatialMaxPooling(2x2, 2,2) (20): cudnn.SpatialConvolution(256 -> 512, 3x3, 1,1, 1,1) (21): cudnn.ReLU (22): cudnn.SpatialConvolution(512 -> 512, 3x3, 1,1, 1,1) (23): cudnn.ReLU (24): cudnn.SpatialConvolution(512 -> 512, 3x3, 1,1, 1,1) (25): cudnn.ReLU (26): cudnn.SpatialConvolution(512 -> 512, 3x3, 1,1, 1,1) (27): cudnn.ReLU (28): cudnn.SpatialMaxPooling(2x2, 2,2) (29): cudnn.SpatialConvolution(512 -> 512, 3x3, 1,1, 1,1) (30): cudnn.ReLU (31): cudnn.SpatialConvolution(512 -> 512, 3x3, 1,1, 1,1) (32): cudnn.ReLU (33): cudnn.SpatialConvolution(512 -> 512, 3x3, 1,1, 1,1) (34): cudnn.ReLU (35): cudnn.SpatialConvolution(512 -> 512, 3x3, 1,1, 1,1) (36): cudnn.ReLU (37): cudnn.SpatialMaxPooling(2x2, 2,2) (38): nn.View(-1) (39): nn.Linear(25088 -> 4096) (40): cudnn.ReLU (41): nn.Dropout(0.500000) (42): nn.Linear(4096 -> 4096) (43): cudnn.ReLU (44): nn.Dropout(0.500000) (45): nn.Linear(4096 -> 1000) (46): cudnn.SoftMax } nn.Sequential { [input -> (1) -> (2) -> (3) -> (4) -> (5) -> (6) -> (7) -> (8) -> (9) -> (10) -> (11) -> (12) -> (13) -> (14) -> (15) -> (16) -> (17) -> (18) -> (19) -> (20) -> (21) -> (22) -> (23) -> (24) -> (25) -> (26) -> (27) -> (28) -> (29) -> (30) -> (31) -> (32) -> (33) -> (34) -> (35) -> (36) -> (37) -> output] (1): cudnn.SpatialConvolution(3 -> 64, 3x3, 1,1, 1,1) (2): cudnn.ReLU (3): cudnn.SpatialConvolution(64 -> 64, 3x3, 1,1, 1,1) (4): cudnn.ReLU (5): cudnn.SpatialMaxPooling(2x2, 2,2) (6): cudnn.SpatialConvolution(64 -> 128, 3x3, 1,1, 1,1) (7): cudnn.ReLU (8): cudnn.SpatialConvolution(128 -> 128, 3x3, 1,1, 1,1) (9): cudnn.ReLU (10): cudnn.SpatialMaxPooling(2x2, 2,2) (11): cudnn.SpatialConvolution(128 -> 256, 3x3, 1,1, 1,1) (12): cudnn.ReLU (13): cudnn.SpatialConvolution(256 -> 256, 3x3, 1,1, 1,1) (14): cudnn.ReLU (15): cudnn.SpatialConvolution(256 -> 256, 3x3, 1,1, 1,1) (16): cudnn.ReLU (17): cudnn.SpatialConvolution(256 -> 256, 3x3, 1,1, 1,1) (18): cudnn.ReLU (19): cudnn.SpatialMaxPooling(2x2, 2,2) (20): cudnn.SpatialConvolution(256 -> 512, 3x3, 1,1, 1,1) (21): cudnn.ReLU (22): cudnn.SpatialConvolution(512 -> 512, 3x3, 1,1, 1,1) (23): cudnn.ReLU (24): cudnn.SpatialConvolution(512 -> 512, 3x3, 1,1, 1,1) (25): cudnn.ReLU (26): cudnn.SpatialConvolution(512 -> 512, 3x3, 1,1, 1,1) (27): cudnn.ReLU (28): cudnn.SpatialMaxPooling(2x2, 2,2) (29): cudnn.SpatialConvolution(512 -> 512, 3x3, 1,1, 1,1) (30): cudnn.ReLU (31): cudnn.SpatialConvolution(512 -> 512, 3x3, 1,1, 1,1) (32): cudnn.ReLU (33): cudnn.SpatialConvolution(512 -> 512, 3x3, 1,1, 1,1) (34): cudnn.ReLU (35): cudnn.SpatialConvolution(512 -> 512, 3x3, 1,1, 1,1) (36): cudnn.ReLU (37): cudnn.SpatialMaxPooling(2x2, 2,2) } processing 82460 images...
/home/rzai/torch/install/bin/luajit: $ Torch: not enough memory: you tried to allocate 15GB. Buy new RAM! at /home/rzai/torch/pkg/torch/lib/TH/THGeneral.c:270 stack traceback: [C]: at 0x7f1d81308e80 [C]: in function 'FloatTensor' prepro_img_vgg.lua:120: in main chunk [C]: in function 'dofile' ...rzai/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk [C]: at 0x00406670 rzai@rzai00:~/prj/HieCoAttenVQA/prepro$ vim /home/rzai/torch/pkg/torch/lib/TH/THGeneral.c rzai@rzai00:~/prj/HieCoAttenVQA/prepro$