[Bug] yolov3 ncnn map error

Checklist

[x] I have searched related issues but cannot get the expected help.
[x] 2. I have read the FAQ documentation but cannot get the expected help.
[x] 3. The bug has not been fixed in the latest version.

Describe the bug

python tools/test.py \
    /openmmlab/mmdetection/configs/yolo/yolov3_d53_8xb8-320-273e_coco.py \
    /openmmlab/mmdeploy_checkpoints/mmdet/yolov3/yolov3_d53_320_273e_coco-421362b6.pth

python tools/test.py \
/openmmlab/mmdeploy/configs/mmdet/detection/single-stage_ncnn_static-800x1344.py \
/openmmlab/mmdetection/configs/yolo/yolov3_d53_8xb8-320-273e_coco.py \
--model "../mmdeploy_regression_working_dir/mmdet/yolov3/ncnn/static/fp32/yolov3_d53_320_273e_coco-421362b6/end2end.param" "../mmdeploy_regression_working_dir/mmdet/yolov3/ncnn/static/fp32/yolov3_d53_320_273e_coco-421362b6/end2end.bin" \
--speed-test

root@d53dfc0364a7:/openmmlab/mmdetection# python tools/test.py \
>     /openmmlab/mmdetection/configs/yolo/yolov3_d53_8xb8-320-273e_coco.py \
>     /openmmlab/mmdeploy_checkpoints/mmdet/yolov3/yolov3_d53_320_273e_coco-421362b6.pth
NOTE! Installing ujson may make loading annotations faster.
11/12 08:09:02 - mmengine - INFO - 
------------------------------------------------------------
System environment:
    sys.platform: linux
    Python: 3.8.12 | packaged by conda-forge | (default, Oct 12 2021, 21:59:51) [GCC 9.4.0]
    CUDA available: True
    numpy_random_seed: 1413352037
    GPU 0: NVIDIA GeForce GTX 1660 Ti
    CUDA_HOME: /usr/local/cuda
    NVCC: Cuda compilation tools, release 11.6, V11.6.55
    GCC: gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
    PyTorch: 1.11.0a0+17540c5
    PyTorch compiling details: PyTorch built with:
  - GCC 9.3
  - C++ Version: 201402
  - Intel(R) Math Kernel Library Version 2019.0.5 Product Build 20190808 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v2.3.3 (Git Hash N/A)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - LAPACK is enabled (usually provided by MKL)
  - NNPACK is enabled
  - CPU capability usage: AVX512
  - CUDA Runtime 11.6
  - NVCC architecture flags: -gencode;arch=compute_52,code=sm_52;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_86,code=compute_86
  - CuDNN 8.3.2  (built against CUDA 11.5)
  - Magma 2.5.2
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.6, CUDNN_VERSION=8.3.2, CXX_COMPILER=/usr/bin/c++, CXX_FLAGS=-fno-gnu-unique -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.11.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=OFF, USE_MPI=ON, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, 

    TorchVision: 0.12.0a0
    OpenCV: 4.5.5
    MMEngine: 0.3.1

Runtime environment:
    cudnn_benchmark: False
    mp_cfg: {'mp_start_method': 'fork', 'opencv_num_threads': 0}
    dist_cfg: {'backend': 'nccl'}
    seed: None
    Distributed launcher: none
    Distributed training: False
    GPU number: 1
------------------------------------------------------------

11/12 08:09:03 - mmengine - INFO - Config:
train_cfg = dict(type='EpochBasedTrainLoop', max_epochs=273, val_interval=7)
val_cfg = dict(type='ValLoop')
test_cfg = dict(type='TestLoop')
param_scheduler = [
    dict(type='LinearLR', start_factor=0.1, by_epoch=False, begin=0, end=2000),
    dict(type='MultiStepLR', by_epoch=True, milestones=[218, 246], gamma=0.1)
]
optim_wrapper = dict(
    type='OptimWrapper',
    optimizer=dict(type='SGD', lr=0.001, momentum=0.9, weight_decay=0.0005),
    clip_grad=dict(max_norm=35, norm_type=2))
auto_scale_lr = dict(enable=False, base_batch_size=64)
default_scope = 'mmdet'
default_hooks = dict(
    timer=dict(type='IterTimerHook'),
    logger=dict(type='LoggerHook', interval=50),
    param_scheduler=dict(type='ParamSchedulerHook'),
    checkpoint=dict(type='CheckpointHook', interval=7),
    sampler_seed=dict(type='DistSamplerSeedHook'),
    visualization=dict(type='DetVisualizationHook'))
env_cfg = dict(
    cudnn_benchmark=False,
    mp_cfg=dict(mp_start_method='fork', opencv_num_threads=0),
    dist_cfg=dict(backend='nccl'))
vis_backends = [dict(type='LocalVisBackend')]
visualizer = dict(
    type='DetLocalVisualizer',
    vis_backends=[dict(type='LocalVisBackend')],
    name='visualizer')
log_processor = dict(type='LogProcessor', window_size=50, by_epoch=True)
log_level = 'INFO'
load_from = '/openmmlab/mmdeploy_checkpoints/mmdet/yolov3/yolov3_d53_320_273e_coco-421362b6.pth'
resume = False
data_preprocessor = dict(
    type='DetDataPreprocessor',
    mean=[0, 0, 0],
    std=[255.0, 255.0, 255.0],
    bgr_to_rgb=True,
    pad_size_divisor=32)
model = dict(
    type='YOLOV3',
    data_preprocessor=dict(
        type='DetDataPreprocessor',
        mean=[0, 0, 0],
        std=[255.0, 255.0, 255.0],
        bgr_to_rgb=True,
        pad_size_divisor=32),
    backbone=dict(
        type='Darknet',
        depth=53,
        out_indices=(3, 4, 5),
        init_cfg=dict(type='Pretrained', checkpoint='open-mmlab://darknet53')),
    neck=dict(
        type='YOLOV3Neck',
        num_scales=3,
        in_channels=[1024, 512, 256],
        out_channels=[512, 256, 128]),
    bbox_head=dict(
        type='YOLOV3Head',
        num_classes=80,
        in_channels=[512, 256, 128],
        out_channels=[1024, 512, 256],
        anchor_generator=dict(
            type='YOLOAnchorGenerator',
            base_sizes=[[(116, 90), (156, 198), (373, 326)],
                        [(30, 61), (62, 45), (59, 119)],
                        [(10, 13), (16, 30), (33, 23)]],
            strides=[32, 16, 8]),
        bbox_coder=dict(type='YOLOBBoxCoder'),
        featmap_strides=[32, 16, 8],
        loss_cls=dict(
            type='CrossEntropyLoss',
            use_sigmoid=True,
            loss_weight=1.0,
            reduction='sum'),
        loss_conf=dict(
            type='CrossEntropyLoss',
            use_sigmoid=True,
            loss_weight=1.0,
            reduction='sum'),
        loss_xy=dict(
            type='CrossEntropyLoss',
            use_sigmoid=True,
            loss_weight=2.0,
            reduction='sum'),
        loss_wh=dict(type='MSELoss', loss_weight=2.0, reduction='sum')),
    train_cfg=dict(
        assigner=dict(
            type='GridAssigner',
            pos_iou_thr=0.5,
            neg_iou_thr=0.5,
            min_pos_iou=0)),
    test_cfg=dict(
        nms_pre=1000,
        min_bbox_size=0,
        score_thr=0.05,
        conf_thr=0.005,
        nms=dict(type='nms', iou_threshold=0.45),
        max_per_img=100))
dataset_type = 'CocoDataset'
data_root = 'data/coco/'
file_client_args = dict(backend='disk')
train_pipeline = [
    dict(type='LoadImageFromFile', file_client_args=dict(backend='disk')),
    dict(type='LoadAnnotations', with_bbox=True),
    dict(type='Expand', mean=[0, 0, 0], to_rgb=True, ratio_range=(1, 2)),
    dict(
        type='MinIoURandomCrop',
        min_ious=(0.4, 0.5, 0.6, 0.7, 0.8, 0.9),
        min_crop_size=0.3),
    dict(type='Resize', scale=(320, 320), keep_ratio=True),
    dict(type='RandomFlip', prob=0.5),
    dict(type='PhotoMetricDistortion'),
    dict(type='PackDetInputs')
]
test_pipeline = [
    dict(type='LoadImageFromFile', file_client_args=dict(backend='disk')),
    dict(type='Resize', scale=(320, 320), keep_ratio=True),
    dict(type='LoadAnnotations', with_bbox=True),
    dict(
        type='PackDetInputs',
        meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape',
                   'scale_factor'))
]
train_dataloader = dict(
    batch_size=8,
    num_workers=4,
    persistent_workers=True,
    sampler=dict(type='DefaultSampler', shuffle=True),
    batch_sampler=dict(type='AspectRatioBatchSampler'),
    dataset=dict(
        type='CocoDataset',
        data_root='data/coco/',
        ann_file='annotations/instances_train2017.json',
        data_prefix=dict(img='train2017/'),
        filter_cfg=dict(filter_empty_gt=True, min_size=32),
        pipeline=[
            dict(
                type='LoadImageFromFile',
                file_client_args=dict(backend='disk')),
            dict(type='LoadAnnotations', with_bbox=True),
            dict(
                type='Expand', mean=[0, 0, 0], to_rgb=True,
                ratio_range=(1, 2)),
            dict(
                type='MinIoURandomCrop',
                min_ious=(0.4, 0.5, 0.6, 0.7, 0.8, 0.9),
                min_crop_size=0.3),
            dict(type='Resize', scale=(320, 320), keep_ratio=True),
            dict(type='RandomFlip', prob=0.5),
            dict(type='PhotoMetricDistortion'),
            dict(type='PackDetInputs')
        ]))
val_dataloader = dict(
    batch_size=1,
    num_workers=2,
    persistent_workers=True,
    drop_last=False,
    sampler=dict(type='DefaultSampler', shuffle=False),
    dataset=dict(
        type='CocoDataset',
        data_root='data/coco/',
        ann_file='annotations/instances_val2017.json',
        data_prefix=dict(img='val2017/'),
        test_mode=True,
        pipeline=[
            dict(
                type='LoadImageFromFile',
                file_client_args=dict(backend='disk')),
            dict(type='Resize', scale=(320, 320), keep_ratio=True),
            dict(type='LoadAnnotations', with_bbox=True),
            dict(
                type='PackDetInputs',
                meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape',
                           'scale_factor'))
        ]))
test_dataloader = dict(
    batch_size=1,
    num_workers=2,
    persistent_workers=True,
    drop_last=False,
    sampler=dict(type='DefaultSampler', shuffle=False),
    dataset=dict(
        type='CocoDataset',
        data_root='data/coco/',
        ann_file='annotations/instances_val2017.json',
        data_prefix=dict(img='val2017/'),
        test_mode=True,
        pipeline=[
            dict(
                type='LoadImageFromFile',
                file_client_args=dict(backend='disk')),
            dict(type='Resize', scale=(320, 320), keep_ratio=True),
            dict(type='LoadAnnotations', with_bbox=True),
            dict(
                type='PackDetInputs',
                meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape',
                           'scale_factor'))
        ]))
val_evaluator = dict(
    type='CocoMetric',
    ann_file='data/coco/annotations/instances_val2017.json',
    metric='bbox')
test_evaluator = dict(
    type='CocoMetric',
    ann_file='data/coco/annotations/instances_val2017.json',
    metric='bbox')
input_size = (320, 320)
launcher = 'none'
work_dir = './work_dirs/yolov3_d53_8xb8-320-273e_coco'

11/12 08:09:03 - mmengine - INFO - Result has been saved to /openmmlab/mmdetection/work_dirs/yolov3_d53_8xb8-320-273e_coco/modules_statistic_results.json
11/12 08:09:05 - mmengine - INFO - Distributed training is not used, all SyncBatchNorm (SyncBN) layers in the model will be automatically reverted to BatchNormXd layers if they are used.
loading annotations into memory...
Done (t=0.00s)
creating index...
index created!
loading annotations into memory...
Done (t=0.00s)
creating index...
index created!
local loads checkpoint from path: /openmmlab/mmdeploy_checkpoints/mmdet/yolov3/yolov3_d53_320_273e_coco-421362b6.pth
11/12 08:09:07 - mmengine - INFO - Load checkpoint from /openmmlab/mmdeploy_checkpoints/mmdet/yolov3/yolov3_d53_320_273e_coco-421362b6.pth
11/12 08:09:09 - mmengine - INFO - Epoch(test) [50/126]    eta: 0:00:02  time: 0.0349  data_time: 0.0019  memory: 292  
11/12 08:09:09 - mmengine - INFO - Epoch(test) [100/126]    eta: 0:00:00  time: 0.0169  data_time: 0.0011  memory: 292  
11/12 08:09:10 - mmengine - INFO - Evaluating bbox...
Loading and preparing results...
DONE (t=0.00s)
creating index...
index created!
Running per image evaluation...
Evaluate annotation type *bbox*
DONE (t=0.53s).
Accumulating evaluation results...
DONE (t=0.28s).
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.48303
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=1000 ] = 0.78549
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=1000 ] = 0.49975
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.15602
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.50076
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.67213
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.53381
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=300 ] = 0.53381
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=1000 ] = 0.53381
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.21963
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.54981
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.69946
11/12 08:09:11 - mmengine - INFO - bbox_mAP_copypaste: 0.483 0.785 0.500 0.156 0.501 0.672
11/12 08:09:11 - mmengine - INFO - Epoch(test) [126/126]  coco/bbox_mAP: 0.4830  coco/bbox_mAP_50: 0.7850  coco/bbox_mAP_75: 0.5000  coco/bbox_mAP_s: 0.1560  coco/bbox_mAP_m: 0.5010  coco/bbox_mAP_l: 0.6720


root@d53dfc0364a7:/openmmlab/mmdeploy# python tools/test.py \
> /openmmlab/mmdeploy/configs/mmdet/detection/single-stage_ncnn_static-800x1344.py \
> /openmmlab/mmdetection/configs/yolo/yolov3_d53_8xb8-320-273e_coco.py \
> --model "../mmdeploy_regression_working_dir/mmdet/yolov3/ncnn/static/fp32/yolov3_d53_320_273e_coco-421362b6/end2end.param" "../mmdeploy_regression_working_dir/mmdet/yolov3/ncnn/static/fp32/yolov3_d53_320_273e_coco-421362b6/end2end.bin" \
> --speed-test
NOTE! Installing ujson may make loading annotations faster.
11/12 08:10:13 - mmengine - WARNING - Failed to search registry with scope "mmdet" in the "Codebases" registry tree. As a workaround, the current "Codebases" registry in "mmdeploy" is used to build instance. This may cause unexpected failure when running the built modules. Please check whether "mmdet" is a correct scope, or whether the registry is initialized.
11/12 08:10:13 - mmengine - WARNING - Failed to search registry with scope "mmdet" in the "mmdet_tasks" registry tree. As a workaround, the current "mmdet_tasks" registry in "mmdeploy" is used to build instance. This may cause unexpected failure when running the built modules. Please check whether "mmdet" is a correct scope, or whether the registry is initialized.
11/12 08:10:13 - mmengine - WARNING - Failed to search registry with scope "mmdet" in the "Codebases" registry tree. As a workaround, the current "Codebases" registry in "mmdeploy" is used to build instance. This may cause unexpected failure when running the built modules. Please check whether "mmdet" is a correct scope, or whether the registry is initialized.
loading annotations into memory...
Done (t=0.00s)
creating index...
index created!
11/12 08:10:13 - mmengine - WARNING - Failed to search registry with scope "mmdet" in the "backend_detectors" registry tree. As a workaround, the current "backend_detectors" registry in "mmdeploy" is used to build instance. This may cause unexpected failure when running the built modules. Please check whether "mmdet" is a correct scope, or whether the registry is initialized.
11/12 08:10:14 - mmengine - INFO - 
------------------------------------------------------------
System environment:
    sys.platform: linux
    Python: 3.8.12 | packaged by conda-forge | (default, Oct 12 2021, 21:59:51) [GCC 9.4.0]
    CUDA available: True
    numpy_random_seed: 322176394
    GPU 0: NVIDIA GeForce GTX 1660 Ti
    CUDA_HOME: /usr/local/cuda
    NVCC: Cuda compilation tools, release 11.6, V11.6.55
    GCC: gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
    PyTorch: 1.11.0a0+17540c5
    PyTorch compiling details: PyTorch built with:
  - GCC 9.3
  - C++ Version: 201402
  - Intel(R) Math Kernel Library Version 2019.0.5 Product Build 20190808 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v2.3.3 (Git Hash N/A)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - LAPACK is enabled (usually provided by MKL)
  - NNPACK is enabled
  - CPU capability usage: AVX512
  - CUDA Runtime 11.6
  - NVCC architecture flags: -gencode;arch=compute_52,code=sm_52;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_86,code=compute_86
  - CuDNN 8.3.2  (built against CUDA 11.5)
  - Magma 2.5.2
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.6, CUDNN_VERSION=8.3.2, CXX_COMPILER=/usr/bin/c++, CXX_FLAGS=-fno-gnu-unique -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.11.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=OFF, USE_MPI=ON, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, 

    TorchVision: 0.12.0a0
    OpenCV: 4.5.5
    MMEngine: 0.3.1

Runtime environment:
    dist_cfg: {'backend': 'nccl'}
    seed: None
    Distributed launcher: none
    Distributed training: False
    GPU number: 1
------------------------------------------------------------

11/12 08:10:14 - mmengine - INFO - Config:

11/12 08:10:14 - mmengine - INFO - Result has been saved to /openmmlab/mmdeploy/work_dir/modules_statistic_results.json
loading annotations into memory...
Done (t=0.00s)
creating index...
index created!
/openmmlab/mmdeploy/mmdeploy/codebase/mmdet/deploy/object_detection_model.py:595: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
  return dets, torch.tensor(labels, dtype=torch.int32)
11/12 08:10:18 - mmengine - INFO - Epoch(test) [50/126]    eta: 0:00:06  time: 0.0834  data_time: 0.0034  memory: 0  
11/12 08:10:22 - mmengine - INFO - Epoch(test) [100/126]    eta: 0:00:02  time: 0.0868  data_time: 0.0012  memory: 0  
11/12 08:10:23 - mmengine - INFO - [ncnn]-110 times per count: 77.10 ms, 12.97 FPS
11/12 08:10:24 - mmengine - INFO - Evaluating bbox...
Loading and preparing results...
DONE (t=0.00s)
creating index...
index created!
Running per image evaluation...
Evaluate annotation type *bbox*
DONE (t=0.50s).
Accumulating evaluation results...
DONE (t=0.29s).
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.46570
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=1000 ] = 0.75770
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=1000 ] = 0.48069
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.15513
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.49595
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.64449
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.51132
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=300 ] = 0.51132
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=1000 ] = 0.51132
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.21246
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.54197
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.66830
11/12 08:10:25 - mmengine - INFO - bbox_mAP_copypaste: 0.466 0.758 0.481 0.155 0.496 0.644
11/12 08:10:25 - mmengine - INFO - Epoch(test) [126/126]  coco/bbox_mAP: 0.4660  coco/bbox_mAP_50: 0.7580  coco/bbox_mAP_75: 0.4810  coco/bbox_mAP_s: 0.1550  coco/bbox_mAP_m: 0.4960  coco/bbox_mAP_l: 0.6440

Reproduction

Environment

root@d53dfc0364a7:/openmmlab/mmdeploy# python tools/check_env.py 
11/12 08:16:56 - mmengine - INFO - 

11/12 08:16:56 - mmengine - INFO - **********Environmental information**********
11/12 08:16:57 - mmengine - INFO - sys.platform: linux
11/12 08:16:57 - mmengine - INFO - Python: 3.8.12 | packaged by conda-forge | (default, Oct 12 2021, 21:59:51) [GCC 9.4.0]
11/12 08:16:57 - mmengine - INFO - CUDA available: True
11/12 08:16:57 - mmengine - INFO - numpy_random_seed: 2147483648
11/12 08:16:57 - mmengine - INFO - GPU 0: NVIDIA GeForce GTX 1660 Ti
11/12 08:16:57 - mmengine - INFO - CUDA_HOME: /usr/local/cuda
11/12 08:16:57 - mmengine - INFO - NVCC: Cuda compilation tools, release 11.6, V11.6.55
11/12 08:16:57 - mmengine - INFO - GCC: gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
11/12 08:16:57 - mmengine - INFO - PyTorch: 1.11.0a0+17540c5
11/12 08:16:57 - mmengine - INFO - PyTorch compiling details: PyTorch built with:
  - GCC 9.3
  - C++ Version: 201402
  - Intel(R) Math Kernel Library Version 2019.0.5 Product Build 20190808 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v2.3.3 (Git Hash N/A)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - LAPACK is enabled (usually provided by MKL)
  - NNPACK is enabled
  - CPU capability usage: AVX512
  - CUDA Runtime 11.6
  - NVCC architecture flags: -gencode;arch=compute_52,code=sm_52;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_86,code=compute_86
  - CuDNN 8.3.2  (built against CUDA 11.5)
  - Magma 2.5.2
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.6, CUDNN_VERSION=8.3.2, CXX_COMPILER=/usr/bin/c++, CXX_FLAGS=-fno-gnu-unique -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.11.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=OFF, USE_MPI=ON, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, 

11/12 08:16:57 - mmengine - INFO - TorchVision: 0.12.0a0
11/12 08:16:57 - mmengine - INFO - OpenCV: 4.5.5
11/12 08:16:57 - mmengine - INFO - MMEngine: 0.3.1
11/12 08:16:57 - mmengine - INFO - MMCV: 2.0.0rc2
11/12 08:16:57 - mmengine - INFO - MMCV Compiler: GCC 9.3
11/12 08:16:57 - mmengine - INFO - MMCV CUDA Compiler: 11.6
11/12 08:16:57 - mmengine - INFO - MMDeploy: 0.10.0+ed2d768
11/12 08:16:57 - mmengine - INFO - 

11/12 08:16:57 - mmengine - INFO - **********Backend information**********
11/12 08:16:57 - mmengine - INFO - onnxruntime: 1.8.1   ops_is_avaliable : True
11/12 08:16:57 - mmengine - INFO - tensorrt: 8.2.3.0    ops_is_avaliable : True
11/12 08:16:57 - mmengine - INFO - ncnn: 1.0.20221112   ops_is_avaliable : True
11/12 08:16:57 - mmengine - INFO - pplnn_is_avaliable: False
11/12 08:16:57 - mmengine - INFO - openvino_is_avaliable: False
11/12 08:16:57 - mmengine - INFO - snpe_is_available: False
11/12 08:16:57 - mmengine - INFO - ascend_is_available: False
11/12 08:16:57 - mmengine - INFO - coreml_is_available: False
11/12 08:16:57 - mmengine - INFO - 

11/12 08:16:57 - mmengine - INFO - **********Codebase information**********
11/12 08:16:57 - mmengine - INFO - mmdet:       3.0.0rc3
11/12 08:16:57 - mmengine - INFO - mmseg:       None
11/12 08:16:57 - mmengine - INFO - mmcls:       None
11/12 08:16:57 - mmengine - INFO - mmocr:       None
11/12 08:16:57 - mmengine - INFO - mmedit:      None
11/12 08:16:57 - mmengine - INFO - mmdet3d:     None
11/12 08:16:57 - mmengine - INFO - mmpose:      None
11/12 08:16:57 - mmengine - INFO - mmrotate:    None
11/12 08:16:57 - mmengine - INFO - mmaction:    None

Error traceback

No response

Can you test onnxruntime precision ?

equal to 0.48303

root@d53dfc0364a7:/openmmlab/mmdeploy# python tools/test.py /openmmlab/mmdeploy/configs/mmdet/detection/detection_onnxruntime_dynamic.py /openmmlab/mmdetection/configs/yolo/yolov3_d53_8xb8-320-273e_coco.py --model "../mmdeploy_regression_working_dir/mmdet/yolov3/onnxruntime/dynamic/fp32/yolov3_d53_320_273e_coco-421362b6/end2end.onnx"  --speed-test
NOTE! Installing ujson may make loading annotations faster.
11/15 08:41:03 - mmengine - WARNING - Failed to search registry with scope "mmdet" in the "Codebases" registry tree. As a workaround, the current "Codebases" registry in "mmdeploy" is used to build instance. This may cause unexpected failure when running the built modules. Please check whether "mmdet" is a correct scope, or whether the registry is initialized.
11/15 08:41:03 - mmengine - WARNING - Failed to search registry with scope "mmdet" in the "mmdet_tasks" registry tree. As a workaround, the current "mmdet_tasks" registry in "mmdeploy" is used to build instance. This may cause unexpected failure when running the built modules. Please check whether "mmdet" is a correct scope, or whether the registry is initialized.
11/15 08:41:03 - mmengine - WARNING - Failed to search registry with scope "mmdet" in the "Codebases" registry tree. As a workaround, the current "Codebases" registry in "mmdeploy" is used to build instance. This may cause unexpected failure when running the built modules. Please check whether "mmdet" is a correct scope, or whether the registry is initialized.
loading annotations into memory...
Done (t=0.00s)
creating index...
index created!
11/15 08:41:03 - mmengine - WARNING - Failed to search registry with scope "mmdet" in the "backend_detectors" registry tree. As a workaround, the current "backend_detectors" registry in "mmdeploy" is used to build instance. This may cause unexpected failure when running the built modules. Please check whether "mmdet" is a correct scope, or whether the registry is initialized.
11/15 08:41:03 - mmengine - INFO - Successfully loaded onnxruntime custom ops from             /openmmlab/mmdeploy/mmdeploy/lib/libmmdeploy_onnxruntime_ops.so
2022-11-15 08:41:03.929254863 [W:onnxruntime:, graph.cc:3211 CleanUnusedInitializers] Removing initializer '1695'. It is not used by any node and should be removed from the model.
2022-11-15 08:41:03.929272333 [W:onnxruntime:, graph.cc:3211 CleanUnusedInitializers] Removing initializer '1692'. It is not used by any node and should be removed from the model.
2022-11-15 08:41:03.929275115 [W:onnxruntime:, graph.cc:3211 CleanUnusedInitializers] Removing initializer '1691'. It is not used by any node and should be removed from the model.
2022-11-15 08:41:03.929277659 [W:onnxruntime:, graph.cc:3211 CleanUnusedInitializers] Removing initializer '1690'. It is not used by any node and should be removed from the model.
2022-11-15 08:41:03.929279881 [W:onnxruntime:, graph.cc:3211 CleanUnusedInitializers] Removing initializer '1687'. It is not used by any node and should be removed from the model.
2022-11-15 08:41:03.929282146 [W:onnxruntime:, graph.cc:3211 CleanUnusedInitializers] Removing initializer '1685'. It is not used by any node and should be removed from the model.
2022-11-15 08:41:03.929284342 [W:onnxruntime:, graph.cc:3211 CleanUnusedInitializers] Removing initializer '1683'. It is not used by any node and should be removed from the model.
2022-11-15 08:41:03.929286556 [W:onnxruntime:, graph.cc:3211 CleanUnusedInitializers] Removing initializer '1682'. It is not used by any node and should be removed from the model.
2022-11-15 08:41:03.929288761 [W:onnxruntime:, graph.cc:3211 CleanUnusedInitializers] Removing initializer '1681'. It is not used by any node and should be removed from the model.
2022-11-15 08:41:03.929291083 [W:onnxruntime:, graph.cc:3211 CleanUnusedInitializers] Removing initializer '1680'. It is not used by any node and should be removed from the model.
2022-11-15 08:41:03.929293229 [W:onnxruntime:, graph.cc:3211 CleanUnusedInitializers] Removing initializer '1679'. It is not used by any node and should be removed from the model.
2022-11-15 08:41:03.929295457 [W:onnxruntime:, graph.cc:3211 CleanUnusedInitializers] Removing initializer '1678'. It is not used by any node and should be removed from the model.
2022-11-15 08:41:03.929297612 [W:onnxruntime:, graph.cc:3211 CleanUnusedInitializers] Removing initializer '1688'. It is not used by any node and should be removed from the model.
2022-11-15 08:41:03.929299809 [W:onnxruntime:, graph.cc:3211 CleanUnusedInitializers] Removing initializer '1677'. It is not used by any node and should be removed from the model.
2022-11-15 08:41:03.929302252 [W:onnxruntime:, graph.cc:3211 CleanUnusedInitializers] Removing initializer '1668'. It is not used by any node and should be removed from the model.
2022-11-15 08:41:03.929304494 [W:onnxruntime:, graph.cc:3211 CleanUnusedInitializers] Removing initializer '1674'. It is not used by any node and should be removed from the model.
2022-11-15 08:41:03.929307560 [W:onnxruntime:, graph.cc:3211 CleanUnusedInitializers] Removing initializer '1667'. It is not used by any node and should be removed from the model.
2022-11-15 08:41:03.929309923 [W:onnxruntime:, graph.cc:3211 CleanUnusedInitializers] Removing initializer '1664'. It is not used by any node and should be removed from the model.
2022-11-15 08:41:03.929312919 [W:onnxruntime:, graph.cc:3211 CleanUnusedInitializers] Removing initializer '1696'. It is not used by any node and should be removed from the model.
2022-11-15 08:41:03.929315817 [W:onnxruntime:, graph.cc:3211 CleanUnusedInitializers] Removing initializer '1663'. It is not used by any node and should be removed from the model.
2022-11-15 08:41:03.929318654 [W:onnxruntime:, graph.cc:3211 CleanUnusedInitializers] Removing initializer '1662'. It is not used by any node and should be removed from the model.
2022-11-15 08:41:03.929321426 [W:onnxruntime:, graph.cc:3211 CleanUnusedInitializers] Removing initializer '1661'. It is not used by any node and should be removed from the model.
2022-11-15 08:41:03.929324240 [W:onnxruntime:, graph.cc:3211 CleanUnusedInitializers] Removing initializer '1659'. It is not used by any node and should be removed from the model.
2022-11-15 08:41:03.929332313 [W:onnxruntime:, graph.cc:3211 CleanUnusedInitializers] Removing initializer '1656'. It is not used by any node and should be removed from the model.
2022-11-15 08:41:03.929334564 [W:onnxruntime:, graph.cc:3211 CleanUnusedInitializers] Removing initializer '1653'. It is not used by any node and should be removed from the model.
2022-11-15 08:41:03.929337481 [W:onnxruntime:, graph.cc:3211 CleanUnusedInitializers] Removing initializer '1665'. It is not used by any node and should be removed from the model.
2022-11-15 08:41:03.929340882 [W:onnxruntime:, graph.cc:3211 CleanUnusedInitializers] Removing initializer '1670'. It is not used by any node and should be removed from the model.
2022-11-15 08:41:03.929344102 [W:onnxruntime:, graph.cc:3211 CleanUnusedInitializers] Removing initializer '1666'. It is not used by any node and should be removed from the model.
2022-11-15 08:41:03.929347001 [W:onnxruntime:, graph.cc:3211 CleanUnusedInitializers] Removing initializer '1660'. It is not used by any node and should be removed from the model.
2022-11-15 08:41:03.929349958 [W:onnxruntime:, graph.cc:3211 CleanUnusedInitializers] Removing initializer '1671'. It is not used by any node and should be removed from the model.
2022-11-15 08:41:03.929354888 [W:onnxruntime:, graph.cc:3211 CleanUnusedInitializers] Removing initializer '1657'. It is not used by any node and should be removed from the model.
2022-11-15 08:41:03.929358117 [W:onnxruntime:, graph.cc:3211 CleanUnusedInitializers] Removing initializer '1669'. It is not used by any node and should be removed from the model.
2022-11-15 08:41:03.929361131 [W:onnxruntime:, graph.cc:3211 CleanUnusedInitializers] Removing initializer '1684'. It is not used by any node and should be removed from the model.
2022-11-15 08:41:03.929364271 [W:onnxruntime:, graph.cc:3211 CleanUnusedInitializers] Removing initializer '1686'. It is not used by any node and should be removed from the model.
2022-11-15 08:41:03.929367585 [W:onnxruntime:, graph.cc:3211 CleanUnusedInitializers] Removing initializer '1676'. It is not used by any node and should be removed from the model.
2022-11-15 08:41:03.929370877 [W:onnxruntime:, graph.cc:3211 CleanUnusedInitializers] Removing initializer '1693'. It is not used by any node and should be removed from the model.
2022-11-15 08:41:03.929373799 [W:onnxruntime:, graph.cc:3211 CleanUnusedInitializers] Removing initializer '1689'. It is not used by any node and should be removed from the model.
11/15 08:41:04 - mmengine - INFO - 
------------------------------------------------------------
System environment:
    sys.platform: linux
    Python: 3.8.12 | packaged by conda-forge | (default, Oct 12 2021, 21:59:51) [GCC 9.4.0]
    CUDA available: True
    numpy_random_seed: 85133499
    GPU 0: NVIDIA GeForce GTX 1660 Ti
    CUDA_HOME: /usr/local/cuda
    NVCC: Cuda compilation tools, release 11.6, V11.6.55
    GCC: gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
    PyTorch: 1.11.0a0+17540c5
    PyTorch compiling details: PyTorch built with:
  - GCC 9.3
  - C++ Version: 201402
  - Intel(R) Math Kernel Library Version 2019.0.5 Product Build 20190808 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v2.3.3 (Git Hash N/A)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - LAPACK is enabled (usually provided by MKL)
  - NNPACK is enabled
  - CPU capability usage: AVX512
  - CUDA Runtime 11.6
  - NVCC architecture flags: -gencode;arch=compute_52,code=sm_52;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_86,code=compute_86
  - CuDNN 8.3.2  (built against CUDA 11.5)
  - Magma 2.5.2
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.6, CUDNN_VERSION=8.3.2, CXX_COMPILER=/usr/bin/c++, CXX_FLAGS=-fno-gnu-unique -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.11.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=OFF, USE_MPI=ON, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, 

    TorchVision: 0.12.0a0
    OpenCV: 4.5.5
    MMEngine: 0.3.1

Runtime environment:
    dist_cfg: {'backend': 'nccl'}
    seed: None
    Distributed launcher: none
    Distributed training: False
    GPU number: 1
------------------------------------------------------------

11/15 08:41:04 - mmengine - INFO - Config:

11/15 08:41:04 - mmengine - INFO - Result has been saved to /openmmlab/mmdeploy/work_dir/modules_statistic_results.json
loading annotations into memory...
Done (t=0.00s)
creating index...
index created!
11/15 08:41:07 - mmengine - INFO - Epoch(test) [50/126]    eta: 0:00:04  time: 0.0665  data_time: 0.0013  memory: 0  
11/15 08:41:10 - mmengine - INFO - Epoch(test) [100/126]    eta: 0:00:01  time: 0.0715  data_time: 0.0037  memory: 0  
11/15 08:41:11 - mmengine - INFO - [onnxruntime]-110 times per count: 60.93 ms, 16.41 FPS
11/15 08:41:12 - mmengine - INFO - Evaluating bbox...
Loading and preparing results...
DONE (t=0.00s)
creating index...
index created!
Running per image evaluation...
Evaluate annotation type *bbox*
DONE (t=0.51s).
Accumulating evaluation results...
DONE (t=0.27s).
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.48303
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=1000 ] = 0.78549
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=1000 ] = 0.49975
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.15602
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.50076
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.67213
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.53381
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=300 ] = 0.53381
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=1000 ] = 0.53381
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.21963
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.54981
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.69946
11/15 08:41:13 - mmengine - INFO - bbox_mAP_copypaste: 0.483 0.785 0.500 0.156 0.501 0.672
11/15 08:41:13 - mmengine - INFO - Epoch(test) [126/126]  coco/bbox_mAP: 0.4830  coco/bbox_mAP_50: 0.7850  coco/bbox_mAP_75: 0.5000  coco/bbox_mAP_s: 0.1560  coco/bbox_mAP_m: 0.5010  coco/bbox_mAP_l: 0.6720

open-mmlab / mmdeploy

[Bug] yolov3 ncnn map error #1360

Checklist

Describe the bug

Reproduction

Environment

Error traceback