Closed niluanwudidadi closed 7 years ago
(1)Set environment variable: export MXNET_ENGINE_TYPE="NaiveEngine" (2)Set convolutional layer workspace: change symbol_vgg.py and add workspace=2048 to all convolutional layers
@alysazhang Thank you very much,I have solved this problem.But I have met another problem when python train_end2end.py
I have downsize the VOC2007 train sets to1096 images,my 980 have 4G MEMORY, but it still report out of memory, I want to know how to solve this problem.Thank you!
yx@yx-X8DTL:~/mxnet/example/mx-rcnn-master$ python train_end2end.pyCalled with argument: Namespace(begin_epoch=0, dataset='PascalVOC', dataset_path='data/VOCdevkit', end_epoch=10, epoch=1, flip=True, frequent=20, gpus='0', image_set='2007_trainval', kvstore='device', lr=0.001, lr_step=50000, network='vgg', prefix='model/e2e', pretrained='model/vgg16', resume=False, root_path='data', work_load_list=None)
{'EPS': 1e-14,
'IMAGE_STRIDE': 0,
'PIXEL_MEANS': array([[[ 123.68 , 116.779, 103.939]]]),
'SCALES': [(600, 1000)],
'TEST': {'BATCH_IMAGES': 1,
'HAS_RPN': False,
'NMS': 0.3,
'RPN_MIN_SIZE': 16,
'RPN_NMS_THRESH': 0.7,
'RPN_POST_NMS_TOP_N': 300,
'RPN_PRE_NMS_TOP_N': 6000},
'TRAIN': {'BATCH_IMAGES': 1,
'BATCH_ROIS': 128,
'BBOX_MEANS': [0.0, 0.0, 0.0, 0.0],
'BBOX_NORMALIZATION_PRECOMPUTED': True,
'BBOX_REGRESSION_THRESH': 0.5,
'BBOX_STDS': [0.1, 0.1, 0.2, 0.2],
'BBOX_WEIGHTS': array([ 1., 1., 1., 1.]),
'BG_THRESH_HI': 0.5,
'BG_THRESH_LO': 0.0,
'END2END': True,
'FG_FRACTION': 0.25,
'FG_THRESH': 0.5,
'RPN_BATCH_SIZE': 256,
'RPN_BBOX_WEIGHTS': [1.0, 1.0, 1.0, 1.0],
'RPN_CLOBBER_POSITIVES': False,
'RPN_FG_FRACTION': 0.5,
'RPN_MIN_SIZE': 16,
'RPN_NEGATIVE_OVERLAP': 0.3,
'RPN_NMS_THRESH': 0.7,
'RPN_POSITIVE_OVERLAP': 0.7,
'RPN_POSITIVE_WEIGHT': -1.0,
'RPN_POST_NMS_TOP_N': 6000,
'RPN_PRE_NMS_TOP_N': 12000}}
num_images 1096
voc_2007_trainval gt roidb loaded from data/cache/voc_2007_trainval_gt_roidb.pkl
append flipped images to roidb
[20:41:11] src/engine/engine.cc:36: MXNet start using engine: NaiveEngine
providing maximum shape [('data', (1, 3, 1000, 1000)), ('gt_boxes', (1, 100, 5))] [('label', (1, 34596)), ('bbox_target', (1, 36, 62, 62)), ('bbox_weight', (1, 36, 62, 62))]
output shape
{'bbox_loss_reshape_output': (1L, 128L, 84L),
'blockgrad0_output': (1L, 128L),
'cls_prob_reshape_output': (1L, 128L, 21L),
'rpn_bbox_loss_output': (1L, 36L, 37L, 50L),
'rpn_cls_prob_output': (1L, 2L, 333L, 50L)}
[20:41:15] /home/yx/mxnet/dmlc-core/include/dmlc/./logging.h:235: [20:41:15] src/storage/./pooled_storage_manager.h:79: cudaMalloc failed: out of memory
Traceback (most recent call last):
File "train_end2end.py", line 185, in
6GB memory is enough for all experiments, including Fast R-CNN. However, 4GB memory cannot do that. Please use py-faster-rcnn e2e training scheme (alternate uses 11GB).
when python demo.py --prefix final --epoch 0 --image myimage.jpg --gpu 0 error occur,can you help me,thank you! I have installed cudnnv5.0 for cuda7.5,nvidia980,ubuntu14.04
/home/yx/mxnet/dmlc-core/include/dmlc/logging.h:235: [23:33:31] src/operator/./convolution-inl.h:299: Check failed: (param_.workspace) >= (required_size) Minimum workspace size: 1228800000 Bytes Given: 1073741824 Bytes [23:33:31] /home/yx/mxnet/dmlc-core/include/dmlc/logging.h:235: [23:33:31] src/engine/./threadedengine.h:306: [23:33:31] src/operator/./convolution-inl.h:299: Check failed: (param.workspace) >= (required_size) Minimum workspace size: 1228800000 Bytes Given: 1073741824 Bytes An fatal error occurred in asynchronous engine operation. If you do not know what caused this error, you can try set environment variable MXNET_ENGINE_TYPE to NaiveEngine and run with debugger (i.e. gdb). This will force all operations to be synchronous and backtrace will give you the series of calls that lead to this error. Remember to set MXNET_ENGINE_TYPE back to empty after debugging. terminate called after throwing an instance of 'dmlc::Error' what(): [23:33:31] src/engine/./threadedengine.h:306: [23:33:31] src/operator/./convolution-inl.h:299: Check failed: (param.workspace) >= (required_size) Minimum workspace size: 1228800000 Bytes Given: 1073741824 Bytes An fatal error occurred in asynchronous engine operation. If you do not know what caused this error, you can try set environment variable MXNET_ENGINE_TYPE to NaiveEngine and run with debugger (i.e. gdb). This will force all operations to be synchronous and backtrace will give you the series of calls that lead to this error. Remember to set MXNET_ENGINE_TYPE back to empty after debugging.