msracver / Deformable-ConvNets

Deformable Convolutional Networks
MIT License
4.04k stars 959 forks source link

Something wrong when i change my fpn's backbone to vgg16 #216

Open Yorionice1 opened 5 years ago

Yorionice1 commented 5 years ago

('Called with argument:', Namespace(cfg='experiments/fpn/cfgs/vgg16_coco_trainval_fpn_dcn_end2end_ohem.yaml', frequent=100)) {'CLASS_AGNOSTIC': False, 'MXNET_VERSION': 'mxnet', 'SCALES': [(1024, 512)], 'TEST': {'BATCH_IMAGES': 1, 'CXX_PROPOSAL': False, 'HAS_RPN': True, 'NMS': 0.5, 'PROPOSAL_MIN_SIZE': 0, 'PROPOSAL_NMS_THRESH': 0.7, 'PROPOSAL_POST_NMS_TOP_N': 2000, 'PROPOSAL_PRE_NMS_TOP_N': 12000, 'RPN_MIN_SIZE': 0, 'RPN_NMS_THRESH': 0.7, 'RPN_POST_NMS_TOP_N': 2000, 'RPN_PRE_NMS_TOP_N': 12000, 'SOFTNMS_THRESH': 0.6, 'USE_SOFTNMS': True, 'max_per_image': 100, 'test_epoch': 7}, 'TEST_SCALES': [[480, 800], [576, 900], [688, 1100], [800, 1200], [1200, 1600]], 'TRAIN': {'ALTERNATE': {'RCNN_BATCH_IMAGES': 0, 'RPN_BATCH_IMAGES': 0, 'rfcn1_epoch': 0, 'rfcn1_lr': 0, 'rfcn1_lr_step': '', 'rfcn2_epoch': 0, 'rfcn2_lr': 0, 'rfcn2_lr_step': '', 'rpn1_epoch': 0, 'rpn1_lr': 0, 'rpn1_lr_step': '', 'rpn2_epoch': 0, 'rpn2_lr': 0, 'rpn2_lr_step': '', 'rpn3_epoch': 0, 'rpn3_lr': 0, 'rpn3_lr_step': ''}, 'ASPECT_GROUPING': True, 'BATCH_IMAGES': 1, 'BATCH_ROIS': -1, 'BATCH_ROIS_OHEM': 512, 'BBOX_MEANS': [0.0, 0.0, 0.0, 0.0], 'BBOX_NORMALIZATION_PRECOMPUTED': True, 'BBOX_REGRESSION_THRESH': 0.5, 'BBOX_STDS': [0.1, 0.1, 0.2, 0.2], 'BBOX_WEIGHTS': array([ 1., 1., 1., 1.]), 'BG_THRESH_HI': 0.5, 'BG_THRESH_LO': 0.0, 'CXX_PROPOSAL': False, 'ENABLE_OHEM': True, 'END2END': True, 'FG_FRACTION': 0.25, 'FG_THRESH': 0.5, 'FLIP': True, 'RESUME': False, 'RPN_BATCH_SIZE': 256, 'RPN_BBOX_WEIGHTS': [1.0, 1.0, 1.0, 1.0], 'RPN_CLOBBER_POSITIVES': False, 'RPN_FG_FRACTION': 0.5, 'RPN_MIN_SIZE': 0, 'RPN_NEGATIVE_OVERLAP': 0.3, 'RPN_NMS_THRESH': 0.7, 'RPN_POSITIVE_OVERLAP': 0.7, 'RPN_POSITIVE_WEIGHT': -1.0, 'RPN_POST_NMS_TOP_N': 2000, 'RPN_PRE_NMS_TOP_N': 12000, 'SHUFFLE': True, 'begin_epoch': 0, 'end_epoch': 7, 'lr': 0.01, 'lr_factor': 0.1, 'lr_step': '4,6', 'model_prefix': 'fpn_coco', 'momentum': 0.9, 'warmup': True, 'warmup_lr': 0.001, 'warmup_step': 250, 'wd': 0.0001}, 'dataset': {'NUM_CLASSES': 2, 'dataset': 'coco', 'dataset_path': './data/coco', 'image_set': 'train2014', 'proposal': 'rpn', 'root_path': './data', 'test_image_set': 'minival2014'}, 'default': {'frequent': 100, 'kvstore': 'device'}, 'gpus': '6', 'network': {'ANCHOR_RATIOS': [2.44], 'ANCHOR_SCALES': [2.0, 2.7, 3.64, 4.92, 6.64, 8.97, 12.11, 16.34, 22.06, 29.79, 40.21], 'FIXED_PARAMS': ['conv1', 'bn_conv1', 'res2', 'bn2', 'gamma', 'beta'], 'FIXED_PARAMS_SHARED': ['conv1', 'bn_conv1', 'res2', 'bn2', 'res3', 'bn3', 'res4', 'bn4', 'gamma', 'beta'], 'IMAGE_STRIDE': 16, 'NUM_ANCHORS': 11, 'PIXEL_MEANS': array([ 103.06, 115.9 , 123.15]), 'RCNN_FEAT_STRIDE': 16, 'RPN_FEAT_STRIDE': [4, 8, 16, 32], 'pretrained': './model/pretrained_model/vgg16', 'pretrained_epoch': 0}, 'output_path': './output/fpn/coco', 'symbol': 'vgg16_fpn_dcn_rcnn'} loading annotations into memory... Done (t=0.02s) creating index... index created! num_images 256 COCO_train2014 gt roidb loaded from ./data/cache/COCO_train2014_gt_roidb.pkl append flipped images to roidb filtered 10 roidb entries: 512 -> 502 providing maximum shape [('data', (1, 3, 1024, 512)), ('gt_boxes', (1, 100, 5))] [('label', (1, 478720)), ('bbox_target', (1, 44, 43520)), ('bbox_weight', (1, 44, 43520))] {'bbox_target': (1L, 44L, 10880L), 'bbox_weight': (1L, 44L, 10880L), 'data': (1L, 3L, 256L, 512L), 'gt_boxes': (1L, 8L, 5L), 'im_info': (1L, 3L), 'label': (1L, 119680L)} ('lr', 0.01, 'lr_epoch_diff', [4.0, 6.0], 'lr_iters', [2008, 3012]) train start! experiments/fpn/../../fpn/../lib/bbox/bbox_transform.py:98: RuntimeWarning: divide by zero encountered in divide targets_dh = np.log(gt_heights / ex_heights) experiments/fpn/../../fpn/operator_py/fpn_roi_pooling.py:31: RuntimeWarning: divide by zero encountered in log2 feat_id = np.clip(np.floor(2 + np.log2(np.sqrt(w h) / 224)), 0, len(self.feat_strides) - 1) Epoch[0] Batch [100] Speed: 0.46 samples/sec Train-RPNAcc=0.806157, RPNLogLoss=0.491446, RPNL1Loss=0.177600, Proposal FG Fraction=0.045908, R-CNN FG Accuracy=0.001264, RCNNAcc=0.944326, RCNNLogLoss=0.372715, RCNNL1Loss=0.063450,
Epoch[0] Batch [200] Speed: 0.47 samples/sec Train-RPNAcc=0.806922, RPNLogLoss=0.462175, RPNL1Loss=0.140644, Proposal FG Fraction=0.056164, R-CNN FG Accuracy=0.000519, RCNNAcc=0.938928, RCNNLogLoss=0.335340, RCNNL1Loss=0.083124,
experiments/fpn/../../fpn/../lib/bbox/bbox_transform.py:129: RuntimeWarning: overflow encountered in exp pred_w = np.exp(dw)
widths[:, np.newaxis] experiments/fpn/../../fpn/../lib/bbox/bbox_transform.py:130: RuntimeWarning: overflow encountered in exp pred_h = np.exp(dh) heights[:, np.newaxis] experiments/fpn/../../fpn/../lib/bbox/bbox_transform.py:97: RuntimeWarning: divide by zero encountered in divide targets_dw = np.log(gt_widths / ex_widths) experiments/fpn/../../fpn/../lib/bbox/bbox_transform.py:136: RuntimeWarning: invalid value encountered in subtract pred_boxes[:, 1::4] = pred_ctr_y - 0.5 (pred_h - 1.0) experiments/fpn/../../fpn/../lib/bbox/bbox_transform.py:138: RuntimeWarning: invalid value encountered in add pred_boxes[:, 2::4] = pred_ctr_x + 0.5 (pred_w - 1.0) experiments/fpn/../../fpn/../lib/bbox/bbox_transform.py:140: RuntimeWarning: invalid value encountered in add pred_boxes[:, 3::4] = pred_ctr_y + 0.5 (pred_h - 1.0) experiments/fpn/../../fpn/operator_py/pyramid_proposal.py:179: RuntimeWarning: invalid value encountered in greater_equal keep = np.where((ws >= min_size) & (hs >= min_size))[0] Error in CustomOp.forward: Traceback (most recent call last): File "/root/anaconda2/lib/python2.7/site-packages/mxnet/operator.py", line 987, in forward_entry aux=tensors[4]) File "experiments/fpn/../../fpn/operator_py/pyramid_proposal.py", line 150, in forward keep = nms(det) File "experiments/fpn/../../fpn/../lib/nms/nms.py", line 27, in _nms return gpu_nms(dets, thresh, device_id) File "gpu_nms.pyx", line 29, in gpu_nms.gpu_nms (gpu_nms.cpp:1608) IndexError: Out of bounds on buffer access (axis 0)

terminate called after throwing an instance of 'dmlc::Error' what(): [08:59:39] src/operator/custom/custom.cc:347: Check failed: reinterpret_cast( params.info->callbacks[kCustomOpForward])( ptrs.size(), const_cast<void*>(ptrs.data()), const_cast<int>(tags.data()), reinterpret_cast<const int*>(req.data()), static_cast(ctx.is_train), params.info->contexts[kCustomOpForward])

Stack trace returned 8 entries: [bt] (0) /root/anaconda2/lib/python2.7/site-packages/mxnet/libmxnet.so(+0x36bac2) [0x7f2808da0ac2] [bt] (1) /root/anaconda2/lib/python2.7/site-packages/mxnet/libmxnet.so(+0x36c0a8) [0x7f2808da10a8] [bt] (2) /root/anaconda2/lib/python2.7/site-packages/mxnet/libmxnet.so(+0x552111) [0x7f2808f87111] [bt] (3) /root/anaconda2/lib/python2.7/site-packages/mxnet/libmxnet.so(+0x56be01) [0x7f2808fa0e01] [bt] (4) /root/anaconda2/lib/python2.7/site-packages/mxnet/libmxnet.so(+0x553006) [0x7f2808f88006] [bt] (5) /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb1a60) [0x7f287b2f8a60] [bt] (6) /lib/x86_64-linux-gnu/libpthread.so.0(+0x8184) [0x7f287d9b5184] [bt] (7) /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f287cfd537d]

Yorionice1 commented 5 years ago

using the same dataset ,RFCN is OK ,so I think there is nothing wrong with my dataset

YuwenXiong commented 5 years ago

That means you encountered gradient explosion (BTW, VGG might not work with FPN)

imranfateh commented 5 years ago

@YuwenXiong Can you please explain in detail. I am getting the same error when i am using Deformable RFCN. but the same dataset is working on F-RCNN. I am using res101.