Closed ShaneYS closed 6 years ago
There should be no problem with data/loader.py. The last batch of each epoch is simply discarded if images left do not fill a mini batch.
Although you did not mention, you have made modifications to train.py seen from the line numbers.
While not clear what caused the problem, the exception is raised in initialization of AnchorLoader. The AnchorLoader loads the first batch to determine data and label shape. If StopIteration is raised, then there is not enough to fill the first batch. Please check AnchorLoader.iter_next
and AnchorLoader.next
.
@ijkguo Thanks for your help. I process the dataset again, but after I run train.py, there's a different error.
(venv-mxnet-rcnn) @35d8974f0c6d:work/mx-rcnn$ python3 train.py --pretrained model/resnet-101-0000.params --network resnet101 --dataset voc --gpus 0,1,2,3
INFO:root:computing cache data/cache/voc_2007_trainval_roidb.pkl
Traceback (most recent call last):
File "train.py", line 286, in
I run the command but there is no file in data/cache.
This line is intended to read image height and width from pascal voc annotation file. If your custom dataset does not have this in the xml annotation, you could read the actual image from disk and fill this information.
@ijkguo Thanks for your help very much. I processed my data again and now it seems OK.
But I get another error when I run python train.py --pretrained model/resnet-101-0000.params --network resnet101 --gpus 0,1,2,3 --imageset 2007_train
:
`(venv-mxnet-rcnn) /work/mx-rcnn$ python train.py --pretrained model/resnet-101-0000.params --network resnet101 --gpus 0,1,2,3 --imageset 2007_train INFO:root:computing cache data/cache/voc_2007_train_roidb.pkl INFO:root:saving cache data/cache/voc_2007_train_roidb.pkl INFO:root:voc_2007_train num_images 1355109 INFO:root:voc_2007_train append flipped images to roidb INFO:root:called with args {'dataset': 'voc', 'epochs': 10, 'gpus': '0,1,2,3', 'imageset': '2007_train', 'img_long_side': 1000, 'img_pixel_means': (0.0, 0.0, 0.0), 'img_pixel_stds': (1.0, 1.0, 1.0), 'img_short_side': 600, 'log_interval': 100, 'lr': 0.001, 'lr_decay_epoch': '7', 'net_fixed_params': ['conv0', 'stage1', 'gamma', 'beta'], 'network': 'resnet101', 'pretrained': 'model/resnet-101-0000.params', 'rcnn_batch_rois': 128, 'rcnn_batch_size': 1, 'rcnn_bbox_stds': (0.1, 0.1, 0.2, 0.2), 'rcnn_feat_stride': 16, 'rcnn_fg_fraction': 0.25, 'rcnn_fg_overlap': 0.5, 'rcnn_num_classes': 2, 'rcnn_pooled_size': (14, 14), 'resume': '', 'rpn_allowed_border': 0, 'rpn_anchor_ratios': (0.5, 1, 2), 'rpn_anchor_scales': (8, 16, 32), 'rpn_batch_rois': 256, 'rpn_bg_overlap': 0.3, 'rpn_feat_stride': 16, 'rpn_fg_fraction': 0.5, 'rpn_fg_overlap': 0.7, 'rpn_min_size': 16, 'rpn_nms_thresh': 0.7, 'rpn_post_nms_topk': 2000, 'rpn_pre_nms_topk': 12000, 'save_prefix': 'model/resnet101', 'start_epoch': 0} INFO:root:max input shape {'bbox_target': (4, 36, 63, 63), 'bbox_weight': (4, 36, 63, 63), 'data': (4, 3, 1000, 1000), 'gt_boxes': (4, 100, 5), 'im_info': (4, 3), 'label': (4, 1, 567, 63)} INFO:root:max output shape {'bbox_loss_reshape_output': (1, 128, 8), 'blockgrad0_output': (1, 128), 'cls_prob_reshape_output': (1, 128, 2), 'rpn_bbox_loss_output': (4, 36, 63, 63), 'rpn_cls_prob_output': (4, 2, 567, 63)} INFO:root:locking params ['bn_data_gamma', 'bn_data_beta', 'conv0_weight', 'bn0_gamma', 'bn0_beta', 'stage1_unit1_bn1_gamma', 'stage1_unit1_bn1_gamma',
'stage4_unit3_bn2_beta', 'stage4_unit3_bn3_gamma', 'stage4_unit3_bn3_beta', 'bn1_gamma', 'bn1_beta'] INFO:root:lr 0.001000 lr_epoch_diff [7] lr_iters [4742881] Traceback (most recent call last): File "/mnt/workspace/yangshuai/venv-mxnet-rcnn/lib/python3.5/site-packages/mxnet/symbol/symbol.py", line 1513, in simple_bind ctypes.byref(exe_handle))) File "/mnt/workspace/yangshuai/venv-mxnet-rcnn/lib/python3.5/site-packages/mxnet/base.py", line 149, in check_call raise MXNetError(py_str(_LIB.MXGetLastError())) mxnet.base.MXNetError: [03:44:22] src/executor/graph_executor.cc:456: InferShape pass cannot decide shapes for the following arguments (0s means unknown dimensions). Please consider providing them as inputs: gt_boxes: [1,0,5],
Stack trace returned 10 entries: [bt] (0) /mnt/workspace/yangshuai/venv-mxnet-rcnn/lib/python3.5/site-packages/mxnet/libmxnet.so(+0x31a9ea) [0x7f29664cf9ea] [bt] (1) /mnt/workspace/yangshuai/venv-mxnet-rcnn/lib/python3.5/site-packages/mxnet/libmxnet.so(+0x31b011) [0x7f29664d0011] [bt] (2) /mnt/workspace/yangshuai/venv-mxnet-rcnn/lib/python3.5/site-packages/mxnet/libmxnet.so(+0x249f7d0) [0x7f29686547d0] [bt] (3) /mnt/workspace/yangshuai/venv-mxnet-rcnn/lib/python3.5/site-packages/mxnet/libmxnet.so(+0x24c17b9) [0x7f29686767b9] [bt] (4) /mnt/workspace/yangshuai/venv-mxnet-rcnn/lib/python3.5/site-packages/mxnet/libmxnet.so(+0x24c21e4) [0x7f29686771e4] [bt] (5) /mnt/workspace/yangshuai/venv-mxnet-rcnn/lib/python3.5/site-packages/mxnet/libmxnet.so(MXExecutorSimpleBind+0x2378) [0x7f29685d61c8] [bt] (6) /mnt/workspace/yangshuai/venv-mxnet-rcnn/lib/python3.5/lib-dynload/_ctypes.cpython-35m-x86_64-linux-gnu.so(ffi_call_unix64+0x4c) [0x7f29d6c6ae20] [bt] (7) /mnt/workspace/yangshuai/venv-mxnet-rcnn/lib/python3.5/lib-dynload/_ctypes.cpython-35m-x86_64-linux-gnu.so(ffi_call+0x2eb) [0x7f29d6c6a88b] [bt] (8) /mnt/workspace/yangshuai/venv-mxnet-rcnn/lib/python3.5/lib-dynload/_ctypes.cpython-35m-x86_64-linux-gnu.so(_ctypes_callproc+0x49a) [0x7f29d6c6501a] [bt] (9) /mnt/workspace/yangshuai/venv-mxnet-rcnn/lib/python3.5/lib-dynload/_ctypes.cpython-35m-x86_64-linux-gnu.so(+0x9fcb) [0x7f29d6c58fcb]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "train.py", line 286, in
Stack trace returned 10 entries: [bt] (0) /mnt/workspace/yangshuai/venv-mxnet-rcnn/lib/python3.5/site-packages/mxnet/libmxnet.so(+0x31a9ea) [0x7f29664cf9ea] [bt] (1) /mnt/workspace/yangshuai/venv-mxnet-rcnn/lib/python3.5/site-packages/mxnet/libmxnet.so(+0x31b011) [0x7f29664d0011] [bt] (2) /mnt/workspace/yangshuai/venv-mxnet-rcnn/lib/python3.5/site-packages/mxnet/libmxnet.so(+0x249f7d0) [0x7f29686547d0] [bt] (3) /mnt/workspace/yangshuai/venv-mxnet-rcnn/lib/python3.5/site-packages/mxnet/libmxnet.so(+0x24c17b9) [0x7f29686767b9] [bt] (4) /mnt/workspace/yangshuai/venv-mxnet-rcnn/lib/python3.5/site-packages/mxnet/libmxnet.so(+0x24c21e4) [0x7f29686771e4] [bt] (5) /mnt/workspace/yangshuai/venv-mxnet-rcnn/lib/python3.5/site-packages/mxnet/libmxnet.so(MXExecutorSimpleBind+0x2378) [0x7f29685d61c8] [bt] (6) /mnt/workspace/yangshuai/venv-mxnet-rcnn/lib/python3.5/lib-dynload/_ctypes.cpython-35m-x86_64-linux-gnu.so(ffi_call_unix64+0x4c) [0x7f29d6c6ae20] [bt] (7) /mnt/workspace/yangshuai/venv-mxnet-rcnn/lib/python3.5/lib-dynload/_ctypes.cpython-35m-x86_64-linux-gnu.so(ffi_call+0x2eb) [0x7f29d6c6a88b] [bt] (8) /mnt/workspace/yangshuai/venv-mxnet-rcnn/lib/python3.5/lib-dynload/_ctypes.cpython-35m-x86_64-linux-gnu.so(_ctypes_callproc+0x49a) [0x7f29d6c6501a] [bt] (9) /mnt/workspace/yangshuai/venv-mxnet-rcnn/lib/python3.5/lib-dynload/_ctypes.cpython-35m-x86_64-linux-gnu.so(+0x9fcb) [0x7f29d6c58fcb]`
I don't know why the dimension of gt_boxes is [1,0,5]. What should I do to fix this error? Thanks to your help and please forgive my terrible English...
Iterate through your custom dataset. Make sure every image has gt_boxes > 0.
When I train model on my own data, I got this error. I don't know how to solve it. Can anyone help me ? Thanks.
(venv)mycomputer$ python3 train.py --pretrained model/resnet-101-0000.params --network resnet101 --dataset voc --gpus 0,1,2,3 INFO:root:loading cache data/cache/voc_2007_trainval_roidb.pkl INFO:root:voc_2007_trainval num_images 0 INFO:root:voc_2007_trainval append flipped images to roidb INFO:root:called with args {'dataset': 'voc', 'epochs': 100000, 'gpus': '0,1,2,3', 'imageset': '2007_trainval', 'img_long_side': 1000, 'img_pixel_means': (0.0, 0.0, 0.0), 'img_pixel_stds': (1.0, 1.0, 1.0), 'img_short_side': 600, 'log_interval': 100, 'lr': 0.001, 'lr_decay_epoch': '10', 'net_fixed_params': ['conv0', 'stage1', 'gamma', 'beta'], 'network': 'resnet101', 'pretrained': 'model/resnet-101-0000.params', 'rcnn_batch_rois': 128, 'rcnn_batch_size': 1, 'rcnn_bbox_stds': (0.1, 0.1, 0.2, 0.2), 'rcnn_feat_stride': 16, 'rcnn_fg_fraction': 0.25, 'rcnn_fg_overlap': 0.5, 'rcnn_num_classes': 602, 'rcnn_pooled_size': (14, 14), 'resume': '', 'rpn_allowed_border': 0, 'rpn_anchor_ratios': (0.5, 1, 2), 'rpn_anchor_scales': (8, 16, 32), 'rpn_batch_rois': 256, 'rpn_bg_overlap': 0.3, 'rpn_feat_stride': 16, 'rpn_fg_fraction': 0.5, 'rpn_fg_overlap': 0.7, 'rpn_min_size': 16, 'rpn_nms_thresh': 0.7, 'rpn_post_nms_topk': 2000, 'rpn_pre_nms_topk': 12000, 'save_prefix': 'resnet101-openimages', 'start_epoch': 0} Traceback (most recent call last): File "train.py", line 286, in
main()
File "train.py", line 282, in main
train_net(sym, roidb, args)
File "train.py", line 27, in train_net
args.img_pixel_means, args.img_pixel_stds, feat_sym, ag, asp, shuffle=True)
File "/data/loader.py", line 145, in init self.next()
File "/data/loader.py", line 172, in next raise StopIteration
StopIteration