gpu内存不足导致aborted（core dumped）

zqdeepbluesky commented 6 years ago

@unsky 你好，代码写的很好，不过我在运行时出现了一个问题，可能是由于我的gpu内存（1050ti，4g）不足导致Initializing net parameters时出现错误，下面是出错的内容： zq@zq-System-Product-Name:~/FPN-master$ ./experiments/scripts/FP_Net_end2end_merge_rcnn.sh 0 FPN pascal_voc ...... I0308 20:29:01.180505 2233 net.cpp:49] Initializing net from parameters:

..... layer { name: "res4b6" type: "Eltwise" bottom: "res4b5" bottom: "res4b6_branch2c" top: "res4b6" } layer { name: "res4b6_relu" type: "ReLU" bottom: "res4b6" top: "res4b6" } layer { name: "res4b7_branch2a" type: "Convolution" bottom: "res4b6" top: "res4b7_branch2a" param { lr_mult: 1 } convolution_param { num_output: 256 bias_term: false pad: 0 kernel_size: 1 stride: 1 } } layer { name: "scale4b7_branch2a" type: "Scale" bottom: "res4b7_branch2a" top: "res4b7_branch2a" param { lr_mult: 0 decay_mult: 0 } param { lr_mult: 0 decay_mult: 0 } scale_param { bias_term: true } } layer { name: "res4b7_branch2a_relu" type: "ReLU" bottom: "res4b7_branch2a" top: "res4b7_branch2a" } layer { name: "res4b7_branch2b" type: "Convolution" bottom: "res4b7_branch2a" top: "res4b7_branch2b" param { lr_mult: 1 } convolution_param { num_output: 256 bias_term: false pad: 1 kernel_size: 3 stride: 1 } }

layer { name: "scale4b7_branch2b" type: "Scale" bottom: "res4b7_branch2b" top: "res4b7_branch2b" param { lr_mult: 0 decay_mult: 0 } param { lr_mult: 0 decay_mult: 0 } scale_param { bias_term: true } } layer { name: "res4b7_branch2b_relu" type: "ReLU" bottom: "res4b7_branch2b" top: "res4b7_branch2b" } layer { name: "res4b7_branch2c" type: "Convolution" bottom: "res4b7_branch2b" top: "res4b7_branch2c" param { lr_mult: 1 } convolution_param { num_output: 1024 bias_term: false pad: 0 kernel_size: 1 stride: 1 } } layer { name: "scale4b7 I0308 20:29:01.184592 2233 layer_factory.hpp:77] Creating layer input-data F0308 20:29:01.184634 2233 layer_factory.hpp:81] Check failed: registry.count(type) == 1 (0 vs. 1) Unknown layer type: Python (known types: AbsVal, Accuracy, ArgMax, BNLL, BatchNorm, BatchReindex, Bias, Concat, ContrastiveLoss, Convolution, Data, Deconvolution, Dropout, DummyData, ELU, Eltwise, Embed, EuclideanLoss, Exp, Filter, Flatten, HDF5Data, HDF5Output, HingeLoss, Im2col, ImageData, InfogainLoss, InnerProduct, LRN, Log, MVN, MemoryData, MultinomialLogisticLoss, PReLU, Pooling, Power, ROIPooling, ReLU, Reduction, Reshape, SPP, Scale, Sigmoid, SigmoidCrossEntropyLoss, Silence, Slice, SmoothL1Loss, Softmax, SoftmaxWithLoss, Split, TanH, Threshold, Tile, WindowData) Check failure stack trace: Aborted (core dumped)

你之前在readme 里写过： “In my expriments, the codes require ~10G GPU memory in training and ~6G in testing. your can design the suit image size, mimbatch size and rcnn batch size for your GPUS.” 但是我的gpu只有4G，而且我不知道要修改的这些batch size在哪些地方，你能告诉我应该怎样修改吗？非常感谢！！！

zj19921221 commented 6 years ago

@zqdeepbluesky 在lib/fast_rcnn/config.py中的line55 __C.TRAIN.BATCH_SIZE =256

zqdeepbluesky commented 6 years ago

@zj19921221 我发现问题的根源不是在于内存不足，而是在于： Check failed: registry.count(type) == 1 (0 vs. 1) Unknown layer type: Python 这意味着我在编译caffe时没有把python layer添加到makefile中。还是要谢谢你。

unsky / FPN

gpu内存不足导致aborted（core dumped） #39