Run demo.py with no result

lc8631058 commented 7 years ago

I run demo.py with amazon EC2 instance, a remote computer, the terminal shows me that:

('use mxnet at', '/home/carnd/mxnet/python/mxnet/__init__.pyc')
{'BINARY_THRESH': 0.4,
 'CLASS_AGNOSTIC': True,
 'MASK_SIZE': 21,
 'MXNET_VERSION': 'mxnet',
 'SCALES': [(600, 1000)],
 'TEST': {'BATCH_IMAGES': 1,
          'CXX_PROPOSAL': False,
          'HAS_RPN': True,
          'ITER': 2,
          'MASK_MERGE_THRESH': 0.5,
          'MIN_DROP_SIZE': 2,
          'NMS': 0.3,
          'PROPOSAL_MIN_SIZE': 2,
          'PROPOSAL_NMS_THRESH': 0.7,
          'PROPOSAL_POST_NMS_TOP_N': 2000,
          'PROPOSAL_PRE_NMS_TOP_N': 20000,
          'RPN_MIN_SIZE': 2,
          'RPN_NMS_THRESH': 0.7,
          'RPN_POST_NMS_TOP_N': 300,
          'RPN_PRE_NMS_TOP_N': 6000,
          'USE_GPU_MASK_MERGE': True,
          'USE_MASK_MERGE': True,
          'test_epoch': 8},
 'TRAIN': {'ASPECT_GROUPING': True,
           'BATCH_IMAGES': 1,
           'BATCH_ROIS': -1,
           'BATCH_ROIS_OHEM': 128,
           'BBOX_MEANS': [0.0, 0.0, 0.0, 0.0],
           'BBOX_NORMALIZATION_PRECOMPUTED': True,
           'BBOX_REGRESSION_THRESH': 0.5,
           'BBOX_STDS': [0.2, 0.2, 0.5, 0.5],
           'BBOX_WEIGHTS': array([ 1.,  1.,  1.,  1.]),
           'BG_THRESH_HI': 0.5,
           'BG_THRESH_LO': 0,
           'BINARY_THRESH': 0.4,
           'CONVNEW3': True,
           'CXX_PROPOSAL': False,
           'ENABLE_OHEM': True,
           'END2END': True,
           'FG_FRACTION': 0.25,
           'FG_THRESH': 0.5,
           'FLIP': True,
           'GAP_SELECT_FROM_ALL': False,
           'IGNORE_GAP': False,
           'LOSS_WEIGHT': [1.0, 10.0, 1.0],
           'RESUME': False,
           'RPN_ALLOWED_BORDER': 0,
           'RPN_BATCH_SIZE': 256,
           'RPN_BBOX_WEIGHTS': [1.0, 1.0, 1.0, 1.0],
           'RPN_CLOBBER_POSITIVES': False,
           'RPN_FG_FRACTION': 0.5,
           'RPN_MIN_SIZE': 2,
           'RPN_NEGATIVE_OVERLAP': 0.3,
           'RPN_NMS_THRESH': 0.7,
           'RPN_POSITIVE_OVERLAP': 0.7,
           'RPN_POSITIVE_WEIGHT': -1.0,
           'RPN_POST_NMS_TOP_N': 300,
           'RPN_PRE_NMS_TOP_N': 6000,
           'SHUFFLE': True,
           'begin_epoch': 0,
           'end_epoch': 8,
           'lr': 0.0005,
           'lr_step': '5.33',
           'model_prefix': 'e2e',
           'momentum': 0.9,
           'warmup': True,
           'warmup_lr': 5e-05,
           'warmup_step': 250,
           'wd': 0.0005},
 'dataset': {'NUM_CLASSES': 81,
             'dataset': 'coco',
             'dataset_path': './data/coco',
             'image_set': 'train2014+valminusminival2014',
             'proposal': 'rpn',
             'root_path': './data',
             'test_image_set': 'test-dev2015'},
 'default': {'frequent': 20, 'kvstore': 'device'},
 'gpus': '0',
 'network': {'ANCHOR_RATIOS': [0.5, 1, 2],
             'ANCHOR_SCALES': [4, 8, 16, 32],
             'FIXED_PARAMS': ['conv1',
                              'bn_conv1',
                              'res2',
                              'bn2',
                              'gamma',
                              'beta'],
             'FIXED_PARAMS_SHARED': ['conv1',
                                     'bn_conv1',
                                     'res2',
                                     'bn2',
                                     'res3',
                                     'bn3',
                                     'res4',
                                     'bn4',
                                     'gamma',
                                     'beta'],
             'IMAGE_STRIDE': 0,
             'NUM_ANCHORS': 12,
             'PIXEL_MEANS': array([ 103.06,  115.9 ,  123.15]),
             'RCNN_FEAT_STRIDE': 16,
             'RPN_FEAT_STRIDE': 16,
             'pretrained': './model/pretrained_model/resnet_v1_101',
             'pretrained_epoch': 0},
 'output_path': '../output/fcis',
 'symbol': 'resnet_v1_101_fcis'}
[14:51:26] src/operator/convolution.cu:87: This convolution is not supported by cudnn, MXNET convolution is applied.
[14:51:26] src/operator/convolution.cu:87: This convolution is not supported by cudnn, MXNET convolution is applied.
[14:51:26] src/operator/convolution.cu:87: This convolution is not supported by cudnn, MXNET convolution is applied.
[14:51:26] src/operator/convolution.cu:87: This convolution is not supported by cudnn, MXNET convolution is applied.
[14:51:26] src/operator/convolution.cu:87: This convolution is not supported by cudnn, MXNET convolution is applied.
[14:51:26] src/operator/convolution.cu:87: This convolution is not supported by cudnn, MXNET convolution is applied.
(426, 640)
invalid device function
invalid device function
testing COCO_test2015_000000000275.jpg 0.5945s
[14:51:28] src/operator/convolution.cu:87: This convolution is not supported by cudnn, MXNET convolution is applied.
[14:51:28] src/operator/convolution.cu:87: This convolution is not supported by cudnn, MXNET convolution is applied.
[14:51:28] src/operator/convolution.cu:87: This convolution is not supported by cudnn, MXNET convolution is applied.
(427, 640)
invalid device function
invalid device function
testing COCO_test2015_000000001412.jpg 0.6275s
(427, 640)
invalid device function
invalid device function
testing COCO_test2015_000000073428.jpg 0.5950s
[14:51:29] src/operator/convolution.cu:87: This convolution is not supported by cudnn, MXNET convolution is applied.
[14:51:29] src/operator/convolution.cu:87: This convolution is not supported by cudnn, MXNET convolution is applied.
[14:51:29] src/operator/convolution.cu:87: This convolution is not supported by cudnn, MXNET convolution is applied.
(428, 640)
invalid device function
invalid device function
testing COCO_test2015_000000393281.jpg 0.6167s
done

I don't know what does invalid device function and [14:51:29] src/operator/convolution.cu:87: This convolution is not supported by cudnn, MXNET convolution is applied. mean, but it seems run correctly. But there are no files in folder FCIS/output.

So I want to see what does demo.py output, I add these codes into demo.py: im = cv2.imread(cur_path + '/../demo/' + im_name) im = cv2.cvtColor(im, cv2.COLOR_BGR2RGB) pre_img = show_masks(im, dets, masks, classes, config, show = True) pre_img = pre_img.astype(np.uint8) pre_img = Image.fromarray(pre_img) pre_img.save("{}".format(im_name)) , in oder to save these output pictures, but there are no changes at all.

lc8631058 commented 7 years ago

now I run demo.py correctly in Jupyter notebook, but the show_masks function just output the original demo images without any changes, I wonder why?

lc8631058 commented 7 years ago

just found the answer in #21

fangxu622 commented 7 years ago

I get this problem

 'default': {'frequent': 20, 'kvstore': 'device'},
 'gpus': '0',
 'network': {'ANCHOR_RATIOS': [0.5, 1, 2],
             'ANCHOR_SCALES': [4, 8, 16, 32],
             'FIXED_PARAMS': ['conv1',
                              'bn_conv1',
                              'res2',
                              'bn2',
                              'gamma',
                              'beta'],
             'FIXED_PARAMS_SHARED': ['conv1',
                                     'bn_conv1',
                                     'res2',
                                     'bn2',
                                     'res3',
                                     'bn3',
                                     'res4',
                                     'bn4',
                                     'gamma',
                                     'beta'],
             'IMAGE_STRIDE': 0,
             'NUM_ANCHORS': 12,
             'PIXEL_MEANS': array([ 103.06,  115.9 ,  123.15]),
             'RCNN_FEAT_STRIDE': 16,
             'RPN_FEAT_STRIDE': 16,
             'pretrained': './model/pretrained_model/resnet_v1_101',
             'pretrained_epoch': 0},
 'output_path': '../output/fcis',
 'symbol': 'resnet_v1_101_fcis'}

[15:55:18] /home/sensetime/mxnet/dmlc-core/include/dmlc/./logging.h:300: [15:55:18] src/c_api/c_api_ndarray.cc:390: Operator _zeros cannot be run; requires at least one of FCompute<xpu>, NDArrayFunction, FCreateOperator be registered

Stack trace returned 10 entries:
[bt] (0) /usr/lib/python2.7/site-packages/mxnet-0.9.5-py2.7.egg/mxnet/libmxnet.so(_ZN4dmlc15LogMessageFatalD1Ev+0x29) [0x7f86b7c0e129]
[bt] (1) /usr/lib/python2.7/site-packages/mxnet-0.9.5-py2.7.egg/mxnet/libmxnet.so(MXImperativeInvoke+0x640) [0x7f86b885f8a0]
[bt] (2) /lib64/libffi.so.6(ffi_call_unix64+0x4c) [0x7f86bfec0dcc]
[bt] (3) /lib64/libffi.so.6(ffi_call+0x1f5) [0x7f86bfec06f5]
[bt] (4) /usr/lib64/python2.7/lib-dynload/_ctypes.so(_ctypes_callproc+0x30b) [0x7f8694d33c8b]
[bt] (5) /usr/lib64/python2.7/lib-dynload/_ctypes.so(+0xaa85) [0x7f8694d2da85]
[bt] (6) /lib64/libpython2.7.so.1.0(PyObject_Call+0x43) [0x7f86d1cf78e3]
[bt] (7) /lib64/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x2336) [0x7f86d1d8c036]
[bt] (8) /lib64/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x7ed) [0x7f86d1d92e3d]
[bt] (9) /lib64/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x663c) [0x7f86d1d9033c]

Traceback (most recent call last):
  File "./fcis/demo.py", line 153, in <module>
    main()
  File "./fcis/demo.py", line 84, in main
    arg_params=arg_params, aux_params=aux_params)
  File "/home/sensetime/FCIS/fcis/core/tester.py", line 30, in __init__
    self._mod.bind(provide_data, provide_label, for_training=False)
  File "/home/sensetime/FCIS/fcis/core/module.py", line 840, in bind
    for_training, inputs_need_grad, force_rebind=False, shared_module=None)
  File "/home/sensetime/FCIS/fcis/core/module.py", line 397, in bind
    state_names=self._state_names)
  File "/home/sensetime/FCIS/fcis/core/DataParallelExecutorGroup.py", line 178, in __init__
    self.bind_exec(data_shapes, label_shapes, shared_group)
  File "/home/sensetime/FCIS/fcis/core/DataParallelExecutorGroup.py", line 278, in bind_exec
    shared_group))
  File "/home/sensetime/FCIS/fcis/core/DataParallelExecutorGroup.py", line 592, in _bind_ith_exec
    context, self.logger)
  File "/home/sensetime/FCIS/fcis/core/DataParallelExecutorGroup.py", line 570, in _get_or_reshape
    arg_arr = nd.zeros(arg_shape, context, dtype=arg_type)
  File "/usr/lib/python2.7/site-packages/mxnet-0.9.5-py2.7.egg/mxnet/ndarray.py", line 946, in zeros
    return _internal._zeros(shape=shape, ctx=ctx, dtype=dtype)
  File "/usr/lib/python2.7/site-packages/mxnet-0.9.5-py2.7.egg/mxnet/_ctypes/ndarray.py", line 164, in generic_ndarray_function
    c_array(ctypes.c_char_p, [c_str(val) for val in vals])))
  File "/usr/lib/python2.7/site-packages/mxnet-0.9.5-py2.7.egg/mxnet/base.py", line 78, in check_call
    raise MXNetError(py_str(_LIB.MXGetLastError()))
mxnet.base.MXNetError: [15:55:18] src/c_api/c_api_ndarray.cc:390: Operator _zeros cannot be run; requires at least one of FCompute<xpu>, NDArrayFunction, FCreateOperator be registered

Stack trace returned 10 entries:
[bt] (0) /usr/lib/python2.7/site-packages/mxnet-0.9.5-py2.7.egg/mxnet/libmxnet.so(_ZN4dmlc15LogMessageFatalD1Ev+0x29) [0x7f86b7c0e129]
[bt] (1) /usr/lib/python2.7/site-packages/mxnet-0.9.5-py2.7.egg/mxnet/libmxnet.so(MXImperativeInvoke+0x640) [0x7f86b885f8a0]
[bt] (2) /lib64/libffi.so.6(ffi_call_unix64+0x4c) [0x7f86bfec0dcc]
[bt] (3) /lib64/libffi.so.6(ffi_call+0x1f5) [0x7f86bfec06f5]
[bt] (4) /usr/lib64/python2.7/lib-dynload/_ctypes.so(_ctypes_callproc+0x30b) [0x7f8694d33c8b]
[bt] (5) /usr/lib64/python2.7/lib-dynload/_ctypes.so(+0xaa85) [0x7f8694d2da85]
[bt] (6) /lib64/libpython2.7.so.1.0(PyObject_Call+0x43) [0x7f86d1cf78e3]
[bt] (7) /lib64/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x2336) [0x7f86d1d8c036]
[bt] (8) /lib64/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x7ed) [0x7f86d1d92e3d]
[bt] (9) /lib64/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x663c) [0x7f86d1d9033c]

liyi14 commented 7 years ago

Hi, @fangxu622 did you do the following steps before building mxnet?

git checkout 62ecb60
git submodule update

liketheflower commented 7 years ago

use mxnet at /home/jimmy/mxnet/python/mxnet/__init__.pyc
{'BINARY_THRESH': 0.4,
 'CLASS_AGNOSTIC': True,
 'MASK_SIZE': 21,
 'MXNET_VERSION': 'mxnet',
 'SCALES': [(600, 1000)],
 'TEST': {'BATCH_IMAGES': 1,
          'CXX_PROPOSAL': False,
          'HAS_RPN': True,
          'ITER': 2,
          'MASK_MERGE_THRESH': 0.5,
          'MIN_DROP_SIZE': 2,
          'NMS': 0.3,
          'PROPOSAL_MIN_SIZE': 2,
          'PROPOSAL_NMS_THRESH': 0.7,
          'PROPOSAL_POST_NMS_TOP_N': 2000,
          'PROPOSAL_PRE_NMS_TOP_N': 20000,
          'RPN_MIN_SIZE': 2,
          'RPN_NMS_THRESH': 0.7,
          'RPN_POST_NMS_TOP_N': 300,
          'RPN_PRE_NMS_TOP_N': 6000,
          'USE_GPU_MASK_MERGE': False,
          'USE_MASK_MERGE': True,
          'test_epoch': 8},
 'TRAIN': {'ASPECT_GROUPING': True,
           'BATCH_IMAGES': 1,
           'BATCH_ROIS': -1,
           'BATCH_ROIS_OHEM': 128,
           'BBOX_MEANS': [0.0, 0.0, 0.0, 0.0],
           'BBOX_NORMALIZATION_PRECOMPUTED': True,
           'BBOX_REGRESSION_THRESH': 0.5,
           'BBOX_STDS': [0.2, 0.2, 0.5, 0.5],
           'BBOX_WEIGHTS': array([ 1.,  1.,  1.,  1.]),
           'BG_THRESH_HI': 0.5,
           'BG_THRESH_LO': 0,
           'BINARY_THRESH': 0.4,
           'CONVNEW3': True,
           'CXX_PROPOSAL': False,
           'ENABLE_OHEM': True,
           'END2END': True,
           'FG_FRACTION': 0.25,
           'FG_THRESH': 0.5,
           'FLIP': True,
           'GAP_SELECT_FROM_ALL': False,
           'IGNORE_GAP': False,
           'LOSS_WEIGHT': [1.0, 10.0, 1.0],
           'RESUME': False,
           'RPN_ALLOWED_BORDER': 0,
           'RPN_BATCH_SIZE': 256,
           'RPN_BBOX_WEIGHTS': [1.0, 1.0, 1.0, 1.0],
           'RPN_CLOBBER_POSITIVES': False,
           'RPN_FG_FRACTION': 0.5,
           'RPN_MIN_SIZE': 2,
           'RPN_NEGATIVE_OVERLAP': 0.3,
           'RPN_NMS_THRESH': 0.7,
           'RPN_POSITIVE_OVERLAP': 0.7,
           'RPN_POSITIVE_WEIGHT': -1.0,
           'RPN_POST_NMS_TOP_N': 300,
           'RPN_PRE_NMS_TOP_N': 6000,
           'SHUFFLE': True,
           'begin_epoch': 0,
           'end_epoch': 8,
           'lr': 0.0005,
           'lr_step': '5.33',
           'model_prefix': 'e2e',
           'momentum': 0.9,
           'warmup': True,
           'warmup_lr': 5e-05,
           'warmup_step': 250,
           'wd': 0.0005},
 'dataset': {'NUM_CLASSES': 81,
             'dataset': 'coco',
             'dataset_path': './data/coco',
             'image_set': 'train2014+valminusminival2014',
             'proposal': 'rpn',
             'root_path': './data',
             'test_image_set': 'test-dev2015'},
 'default': {'frequent': 20, 'kvstore': 'device'},
 'gpus': '0',
 'network': {'ANCHOR_RATIOS': [0.5, 1, 2],
             'ANCHOR_SCALES': [4, 8, 16, 32],
             'FIXED_PARAMS': ['conv1',
                              'bn_conv1',
                              'res2',
                              'bn2',
                              'gamma',
                              'beta'],
             'FIXED_PARAMS_SHARED': ['conv1',
                                     'bn_conv1',
                                     'res2',
                                     'bn2',
                                     'res3',
                                     'bn3',
                                     'res4',
                                     'bn4',
                                     'gamma',
                                     'beta'],
             'IMAGE_STRIDE': 0,
             'NUM_ANCHORS': 12,
             'PIXEL_MEANS': array([ 103.06,  115.9 ,  123.15]),
             'RCNN_FEAT_STRIDE': 16,
             'RPN_FEAT_STRIDE': 16,
             'pretrained': './model/pretrained_model/resnet_v1_101',
             'pretrained_epoch': 0},
 'output_path': '../output/fcis',
 'symbol': 'resnet_v1_101_fcis'}
[12:00:36] src/operator/convolution.cu:87: This convolution is not supported by cudnn, MXNET convolution is applied.
[12:00:36] src/operator/convolution.cu:87: This convolution is not supported by cudnn, MXNET convolution is applied.
[12:00:36] src/operator/convolution.cu:87: This convolution is not supported by cudnn, MXNET convolution is applied.
[12:00:37] src/operator/convolution.cu:87: This convolution is not supported by cudnn, MXNET convolution is applied.
[12:00:37] src/operator/convolution.cu:87: This convolution is not supported by cudnn, MXNET convolution is applied.
[12:00:37] src/operator/convolution.cu:87: This convolution is not supported by cudnn, MXNET convolution is applied.
(426, 640)
invalid device function
invalid device function
testing COCO_test2015_000000000275.jpg 2.1574s
[12:00:54] src/operator/convolution.cu:87: This convolution is not supported by cudnn, MXNET convolution is applied.
[12:00:54] src/operator/convolution.cu:87: This convolution is not supported by cudnn, MXNET convolution is applied.
[12:00:54] src/operator/convolution.cu:87: This convolution is not supported by cudnn, MXNET convolution is applied.
(427, 640)
invalid device function
invalid device function
testing COCO_test2015_000000001412.jpg 2.2110s
(427, 640)
invalid device function
invalid device function
testing COCO_test2015_000000073428.jpg 2.1906s
[12:01:05] src/operator/convolution.cu:87: This convolution is not supported by cudnn, MXNET convolution is applied.
[12:01:05] src/operator/convolution.cu:87: This convolution is not supported by cudnn, MXNET convolution is applied.
[12:01:05] src/operator/convolution.cu:87: This convolution is not supported by cudnn, MXNET convolution is applied.
(428, 640)
invalid device function
invalid device function
testing COCO_test2015_000000393281.jpg 2.1758s
done

similar output on the HP laptop. The mask is not correctly shown. On aws p2, no issues.

liketheflower commented 7 years ago

The key issue here might be the "invalid device function" . I changed the mask voting to CPU version. It doesn't work either.

liketheflower commented 7 years ago

The problem is resolved. In the fcis_coco_demo.yaml file both the USE_MASK_MERGE and USE_GPU_MASK_MERGE should be set as false then it works.

  # ITER 2 & mask merge
  ITER: 2
  MIN_DROP_SIZE: 2
  USE_MASK_MERGE: false
  USE_GPU_MASK_MERGE: false

tfzhou commented 7 years ago

@liketheflower Thank you for the comments. They help me a lot to resolve my problems. However, setting both flags as false is not the optimal solution. Rather, we should set either one as true according to the cpu or gpu mode we use. In this way, we can obtain more accurate segmentation results.

liketheflower commented 7 years ago

@tfzhou I think you are right. I found the result when using the default mode is more accurate.

niuhaoyu16 commented 6 years ago

@liyi14 I got fatal when I run this : git submodule update

fatal: reference is not a tree: 89de7ab20167909bc2c4f8acd397671c47cf3c0d Submodule path 'dmlc-core': checked out 'b5bec5481df86e8e6728d8bd80a61d87ef3b2cd5' Submodule path 'mshadow': checked out '23210f3939428e42bc34553469ed9ce8c63001ed' Submodule path 'nnvm': checked out 'ddf3c17e3455db9cd10f5b18bc9753a146971819' Unable to checkout '89de7ab20167909bc2c4f8acd397671c47cf3c0d' in submodule path 'cub'

msracver / FCIS

Run demo.py with no result #38