ijkguo / mx-rcnn

Parallel Faster R-CNN implementation with MXNet.
Other
671 stars 290 forks source link

File "/data/loader.py", line 172, in next raise StopIteration #97

Closed ShaneYS closed 6 years ago

ShaneYS commented 6 years ago

When I train model on my own data, I got this error. I don't know how to solve it. Can anyone help me ? Thanks.

(venv)mycomputer$ python3 train.py --pretrained model/resnet-101-0000.params --network resnet101 --dataset voc --gpus 0,1,2,3 INFO:root:loading cache data/cache/voc_2007_trainval_roidb.pkl INFO:root:voc_2007_trainval num_images 0 INFO:root:voc_2007_trainval append flipped images to roidb INFO:root:called with args {'dataset': 'voc', 'epochs': 100000, 'gpus': '0,1,2,3', 'imageset': '2007_trainval', 'img_long_side': 1000, 'img_pixel_means': (0.0, 0.0, 0.0), 'img_pixel_stds': (1.0, 1.0, 1.0), 'img_short_side': 600, 'log_interval': 100, 'lr': 0.001, 'lr_decay_epoch': '10', 'net_fixed_params': ['conv0', 'stage1', 'gamma', 'beta'], 'network': 'resnet101', 'pretrained': 'model/resnet-101-0000.params', 'rcnn_batch_rois': 128, 'rcnn_batch_size': 1, 'rcnn_bbox_stds': (0.1, 0.1, 0.2, 0.2), 'rcnn_feat_stride': 16, 'rcnn_fg_fraction': 0.25, 'rcnn_fg_overlap': 0.5, 'rcnn_num_classes': 602, 'rcnn_pooled_size': (14, 14), 'resume': '', 'rpn_allowed_border': 0, 'rpn_anchor_ratios': (0.5, 1, 2), 'rpn_anchor_scales': (8, 16, 32), 'rpn_batch_rois': 256, 'rpn_bg_overlap': 0.3, 'rpn_feat_stride': 16, 'rpn_fg_fraction': 0.5, 'rpn_fg_overlap': 0.7, 'rpn_min_size': 16, 'rpn_nms_thresh': 0.7, 'rpn_post_nms_topk': 2000, 'rpn_pre_nms_topk': 12000, 'save_prefix': 'resnet101-openimages', 'start_epoch': 0} Traceback (most recent call last): File "train.py", line 286, in main() File "train.py", line 282, in main train_net(sym, roidb, args) File "train.py", line 27, in train_net args.img_pixel_means, args.img_pixel_stds, feat_sym, ag, asp, shuffle=True) File "/data/loader.py", line 145, in init self.next() File "/data/loader.py", line 172, in next raise StopIteration StopIteration

ijkguo commented 6 years ago

There should be no problem with data/loader.py. The last batch of each epoch is simply discarded if images left do not fill a mini batch.

Although you did not mention, you have made modifications to train.py seen from the line numbers.

While not clear what caused the problem, the exception is raised in initialization of AnchorLoader. The AnchorLoader loads the first batch to determine data and label shape. If StopIteration is raised, then there is not enough to fill the first batch. Please check AnchorLoader.iter_next and AnchorLoader.next.

ShaneYS commented 6 years ago

@ijkguo Thanks for your help. I process the dataset again, but after I run train.py, there's a different error.

(venv-mxnet-rcnn) @35d8974f0c6d:work/mx-rcnn$ python3 train.py --pretrained model/resnet-101-0000.params --network resnet101 --dataset voc --gpus 0,1,2,3 INFO:root:computing cache data/cache/voc_2007_trainval_roidb.pkl Traceback (most recent call last): File "train.py", line 286, in main() File "train.py", line 280, in main roidb = get_dataset(args.dataset, args) File "train.py", line 264, in get_dataset return datasetsdataset File "train.py", line 167, in get_voc imdb = PascalVOC(iset, 'data', 'data/VOCdevkit') File "/mnt/workspace/yangshuai/work/mx-rcnn/imdb/pascal_voc.py", line 41, in init self._roidb = self._get_cached('roidb', self._load_gt_roidb) File "/mnt/workspace/yangshuai/work/mx-rcnn/imdb/imdb.py", line 99, in _get_cached cached = fn() File "/mnt/workspace/yangshuai/work/mx-rcnn/imdb/pascal_voc.py", line 46, in _load_gt_roidb gt_roidb = [self._load_annotation(index) for index in image_index] File "/mnt/workspace/yangshuai/work/mx-rcnn/imdb/pascal_voc.py", line 46, in gt_roidb = [self._load_annotation(index) for index in image_index] File "/mnt/workspace/yangshuai/work/mx-rcnn/imdb/pascal_voc.py", line 56, in _load_annotation height, width, orig_objs = self._parse_voc_anno(self._image_anno_tmpl.format(index)) File "/mnt/workspace/yangshuai/work/mx-rcnn/imdb/pascal_voc.py", line 93, in _parse_voc_anno height = int(tree.find('size').find('height').text) AttributeError: 'NoneType' object has no attribute 'find'

I run the command but there is no file in data/cache.

ijkguo commented 6 years ago

This line is intended to read image height and width from pascal voc annotation file. If your custom dataset does not have this in the xml annotation, you could read the actual image from disk and fill this information.

ShaneYS commented 6 years ago

@ijkguo Thanks for your help very much. I processed my data again and now it seems OK.

But I get another error when I run python train.py --pretrained model/resnet-101-0000.params --network resnet101 --gpus 0,1,2,3 --imageset 2007_train :

`(venv-mxnet-rcnn) /work/mx-rcnn$ python train.py --pretrained model/resnet-101-0000.params --network resnet101 --gpus 0,1,2,3 --imageset 2007_train INFO:root:computing cache data/cache/voc_2007_train_roidb.pkl INFO:root:saving cache data/cache/voc_2007_train_roidb.pkl INFO:root:voc_2007_train num_images 1355109 INFO:root:voc_2007_train append flipped images to roidb INFO:root:called with args {'dataset': 'voc', 'epochs': 10, 'gpus': '0,1,2,3', 'imageset': '2007_train', 'img_long_side': 1000, 'img_pixel_means': (0.0, 0.0, 0.0), 'img_pixel_stds': (1.0, 1.0, 1.0), 'img_short_side': 600, 'log_interval': 100, 'lr': 0.001, 'lr_decay_epoch': '7', 'net_fixed_params': ['conv0', 'stage1', 'gamma', 'beta'], 'network': 'resnet101', 'pretrained': 'model/resnet-101-0000.params', 'rcnn_batch_rois': 128, 'rcnn_batch_size': 1, 'rcnn_bbox_stds': (0.1, 0.1, 0.2, 0.2), 'rcnn_feat_stride': 16, 'rcnn_fg_fraction': 0.25, 'rcnn_fg_overlap': 0.5, 'rcnn_num_classes': 2, 'rcnn_pooled_size': (14, 14), 'resume': '', 'rpn_allowed_border': 0, 'rpn_anchor_ratios': (0.5, 1, 2), 'rpn_anchor_scales': (8, 16, 32), 'rpn_batch_rois': 256, 'rpn_bg_overlap': 0.3, 'rpn_feat_stride': 16, 'rpn_fg_fraction': 0.5, 'rpn_fg_overlap': 0.7, 'rpn_min_size': 16, 'rpn_nms_thresh': 0.7, 'rpn_post_nms_topk': 2000, 'rpn_pre_nms_topk': 12000, 'save_prefix': 'model/resnet101', 'start_epoch': 0} INFO:root:max input shape {'bbox_target': (4, 36, 63, 63), 'bbox_weight': (4, 36, 63, 63), 'data': (4, 3, 1000, 1000), 'gt_boxes': (4, 100, 5), 'im_info': (4, 3), 'label': (4, 1, 567, 63)} INFO:root:max output shape {'bbox_loss_reshape_output': (1, 128, 8), 'blockgrad0_output': (1, 128), 'cls_prob_reshape_output': (1, 128, 2), 'rpn_bbox_loss_output': (4, 36, 63, 63), 'rpn_cls_prob_output': (4, 2, 567, 63)} INFO:root:locking params ['bn_data_gamma', 'bn_data_beta', 'conv0_weight', 'bn0_gamma', 'bn0_beta', 'stage1_unit1_bn1_gamma', 'stage1_unit1_bn1_gamma',

'stage4_unit3_bn2_beta', 'stage4_unit3_bn3_gamma', 'stage4_unit3_bn3_beta', 'bn1_gamma', 'bn1_beta'] INFO:root:lr 0.001000 lr_epoch_diff [7] lr_iters [4742881] Traceback (most recent call last): File "/mnt/workspace/yangshuai/venv-mxnet-rcnn/lib/python3.5/site-packages/mxnet/symbol/symbol.py", line 1513, in simple_bind ctypes.byref(exe_handle))) File "/mnt/workspace/yangshuai/venv-mxnet-rcnn/lib/python3.5/site-packages/mxnet/base.py", line 149, in check_call raise MXNetError(py_str(_LIB.MXGetLastError())) mxnet.base.MXNetError: [03:44:22] src/executor/graph_executor.cc:456: InferShape pass cannot decide shapes for the following arguments (0s means unknown dimensions). Please consider providing them as inputs: gt_boxes: [1,0,5],

Stack trace returned 10 entries: [bt] (0) /mnt/workspace/yangshuai/venv-mxnet-rcnn/lib/python3.5/site-packages/mxnet/libmxnet.so(+0x31a9ea) [0x7f29664cf9ea] [bt] (1) /mnt/workspace/yangshuai/venv-mxnet-rcnn/lib/python3.5/site-packages/mxnet/libmxnet.so(+0x31b011) [0x7f29664d0011] [bt] (2) /mnt/workspace/yangshuai/venv-mxnet-rcnn/lib/python3.5/site-packages/mxnet/libmxnet.so(+0x249f7d0) [0x7f29686547d0] [bt] (3) /mnt/workspace/yangshuai/venv-mxnet-rcnn/lib/python3.5/site-packages/mxnet/libmxnet.so(+0x24c17b9) [0x7f29686767b9] [bt] (4) /mnt/workspace/yangshuai/venv-mxnet-rcnn/lib/python3.5/site-packages/mxnet/libmxnet.so(+0x24c21e4) [0x7f29686771e4] [bt] (5) /mnt/workspace/yangshuai/venv-mxnet-rcnn/lib/python3.5/site-packages/mxnet/libmxnet.so(MXExecutorSimpleBind+0x2378) [0x7f29685d61c8] [bt] (6) /mnt/workspace/yangshuai/venv-mxnet-rcnn/lib/python3.5/lib-dynload/_ctypes.cpython-35m-x86_64-linux-gnu.so(ffi_call_unix64+0x4c) [0x7f29d6c6ae20] [bt] (7) /mnt/workspace/yangshuai/venv-mxnet-rcnn/lib/python3.5/lib-dynload/_ctypes.cpython-35m-x86_64-linux-gnu.so(ffi_call+0x2eb) [0x7f29d6c6a88b] [bt] (8) /mnt/workspace/yangshuai/venv-mxnet-rcnn/lib/python3.5/lib-dynload/_ctypes.cpython-35m-x86_64-linux-gnu.so(_ctypes_callproc+0x49a) [0x7f29d6c6501a] [bt] (9) /mnt/workspace/yangshuai/venv-mxnet-rcnn/lib/python3.5/lib-dynload/_ctypes.cpython-35m-x86_64-linux-gnu.so(+0x9fcb) [0x7f29d6c58fcb]

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "train.py", line 286, in main() File "train.py", line 282, in main train_net(sym, roidb, args) File "train.py", line 103, in train_net arg_params=arg_params, aux_params=aux_params, begin_epoch=args.start_epoch, num_epoch=args.epochs) File "/mnt/workspace/yangshuai/venv-mxnet-rcnn/lib/python3.5/site-packages/mxnet/module/base_module.py", line 484, in fit for_training=True, force_rebind=force_rebind) File "/mnt/workspace/yangshuai/venv-mxnet-rcnn/lib/python3.5/site-packages/mxnet/module/module.py", line 430, in bind state_names=self._state_names) File "/mnt/workspace/yangshuai/venv-mxnet-rcnn/lib/python3.5/site-packages/mxnet/module/executor_group.py", line 265, in init self.bind_exec(data_shapes, label_shapes, shared_group) File "/mnt/workspace/yangshuai/venv-mxnet-rcnn/lib/python3.5/site-packages/mxnet/module/executor_group.py", line 361, in bind_exec shared_group)) File "/mnt/workspace/yangshuai/venv-mxnet-rcnn/lib/python3.5/site-packages/mxnet/module/executor_group.py", line 639, in _bind_ith_exec shared_buffer=shared_data_arrays, **input_shapes) File "/mnt/workspace/yangshuai/venv-mxnet-rcnn/lib/python3.5/site-packages/mxnet/symbol/symbol.py", line 1519, in simple_bind raise RuntimeError(error_msg) RuntimeError: simple_bind error. Arguments: im_info: (1, 3) label: (1, 20178) bbox_weight: (1, 36, 38, 59) bbox_target: (1, 36, 38, 59) data: (1, 3, 600, 941) gt_boxes: (1, 0, 5) [03:44:22] src/executor/graph_executor.cc:456: InferShape pass cannot decide shapes for the following arguments (0s means unknown dimensions). Please consider providing them as inputs: gt_boxes: [1,0,5],

Stack trace returned 10 entries: [bt] (0) /mnt/workspace/yangshuai/venv-mxnet-rcnn/lib/python3.5/site-packages/mxnet/libmxnet.so(+0x31a9ea) [0x7f29664cf9ea] [bt] (1) /mnt/workspace/yangshuai/venv-mxnet-rcnn/lib/python3.5/site-packages/mxnet/libmxnet.so(+0x31b011) [0x7f29664d0011] [bt] (2) /mnt/workspace/yangshuai/venv-mxnet-rcnn/lib/python3.5/site-packages/mxnet/libmxnet.so(+0x249f7d0) [0x7f29686547d0] [bt] (3) /mnt/workspace/yangshuai/venv-mxnet-rcnn/lib/python3.5/site-packages/mxnet/libmxnet.so(+0x24c17b9) [0x7f29686767b9] [bt] (4) /mnt/workspace/yangshuai/venv-mxnet-rcnn/lib/python3.5/site-packages/mxnet/libmxnet.so(+0x24c21e4) [0x7f29686771e4] [bt] (5) /mnt/workspace/yangshuai/venv-mxnet-rcnn/lib/python3.5/site-packages/mxnet/libmxnet.so(MXExecutorSimpleBind+0x2378) [0x7f29685d61c8] [bt] (6) /mnt/workspace/yangshuai/venv-mxnet-rcnn/lib/python3.5/lib-dynload/_ctypes.cpython-35m-x86_64-linux-gnu.so(ffi_call_unix64+0x4c) [0x7f29d6c6ae20] [bt] (7) /mnt/workspace/yangshuai/venv-mxnet-rcnn/lib/python3.5/lib-dynload/_ctypes.cpython-35m-x86_64-linux-gnu.so(ffi_call+0x2eb) [0x7f29d6c6a88b] [bt] (8) /mnt/workspace/yangshuai/venv-mxnet-rcnn/lib/python3.5/lib-dynload/_ctypes.cpython-35m-x86_64-linux-gnu.so(_ctypes_callproc+0x49a) [0x7f29d6c6501a] [bt] (9) /mnt/workspace/yangshuai/venv-mxnet-rcnn/lib/python3.5/lib-dynload/_ctypes.cpython-35m-x86_64-linux-gnu.so(+0x9fcb) [0x7f29d6c58fcb]`

I don't know why the dimension of gt_boxes is [1,0,5]. What should I do to fix this error? Thanks to your help and please forgive my terrible English...

ijkguo commented 6 years ago

Iterate through your custom dataset. Make sure every image has gt_boxes > 0.