open-mmlab / mmdeploy

OpenMMLab Model Deployment Framework
https://mmdeploy.readthedocs.io/en/latest/
Apache License 2.0
2.73k stars 624 forks source link

Fail when I tried to convert a MMROTATE s2anet model[Bug] #2470

Closed galaxyGGG closed 1 year ago

galaxyGGG commented 1 year ago

Checklist

Describe the bug

I followed the guide to: 1 create a conda env 2 install mmcv/mmdet/mmrotate/mmdeploy 3 convert a s2anet model on onnx(cpu)

The output stuck and finally stop with error. I'm not sure whether it is because my PC is not good enough, I have only a 1060 with 2GB memory.

Reproduction

python tools/deploy.py configs/mmrotate/rotated-detection_onnxruntime_dynamic.py s2anet-le135_r50_fpn_amp-1x_dota.py s2anet_r50_fpn_fp16_1x_dota_le135-5cac515c.pth dota_demo.jpg --work-dir mmdeploy_models/mmrotate/redet/ort --device cpu --show --dump-info

Environment

09/28 10:55:53 - mmengine - INFO - **********Environmental information**********
09/28 10:55:54 - mmengine - INFO - sys.platform: linux
09/28 10:55:54 - mmengine - INFO - Python: 3.8.13 (default, Mar 28 2022, 11:38:47) [GCC 7.5.0]
09/28 10:55:54 - mmengine - INFO - CUDA available: True
09/28 10:55:54 - mmengine - INFO - numpy_random_seed: 2147483648
09/28 10:55:54 - mmengine - INFO - GPU 0: NVIDIA GeForce GTX 1060 3GB
09/28 10:55:54 - mmengine - INFO - CUDA_HOME: /usr/local/cuda-10.1
09/28 10:55:54 - mmengine - INFO - NVCC: Cuda compilation tools, release 10.1, V10.1.24
09/28 10:55:54 - mmengine - INFO - GCC: gcc (Ubuntu 5.4.0-6ubuntu1~16.04.12) 5.4.0 20160609
09/28 10:55:54 - mmengine - INFO - PyTorch: 1.8.0
09/28 10:55:54 - mmengine - INFO - PyTorch compiling details: PyTorch built with:
  - GCC 7.3
  - C++ Version: 201402
  - Intel(R) oneAPI Math Kernel Library Version 2021.4-Product Build 20210904 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v1.7.0 (Git Hash 7aed236906b1f7a05c0917e5257a1af05e9ff683)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - NNPACK is enabled
  - CPU capability usage: AVX2
  - CUDA Runtime 10.1
  - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_37,code=compute_37
  - CuDNN 7.6.3
  - Magma 2.5.2
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=10.1, CUDNN_VERSION=7.6.3, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.8.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, 

09/28 10:55:54 - mmengine - INFO - TorchVision: 0.9.0
09/28 10:55:54 - mmengine - INFO - OpenCV: 4.8.0
09/28 10:55:54 - mmengine - INFO - MMEngine: 0.8.5
09/28 10:55:54 - mmengine - INFO - MMCV: 2.0.1
09/28 10:55:54 - mmengine - INFO - MMCV Compiler: GCC 7.3
09/28 10:55:54 - mmengine - INFO - MMCV CUDA Compiler: 10.1
09/28 10:55:54 - mmengine - INFO - MMDeploy: 1.3.0+1132e82
09/28 10:55:54 - mmengine - INFO - 

09/28 10:55:54 - mmengine - INFO - **********Backend information**********
09/28 10:55:54 - mmengine - INFO - tensorrt:    None
09/28 10:55:54 - mmengine - INFO - ONNXRuntime: 1.16.0
09/28 10:55:54 - mmengine - INFO - ONNXRuntime-gpu: None
09/28 10:55:54 - mmengine - INFO - ONNXRuntime custom ops:  Available
09/28 10:55:54 - mmengine - INFO - pplnn:   None
09/28 10:55:54 - mmengine - INFO - ncnn:    None
09/28 10:55:54 - mmengine - INFO - snpe:    None
09/28 10:55:54 - mmengine - INFO - openvino:    None
09/28 10:55:54 - mmengine - INFO - torchscript: 1.8.0
09/28 10:55:54 - mmengine - INFO - torchscript custom ops:  NotAvailable
09/28 10:55:54 - mmengine - INFO - rknn-toolkit:    None
09/28 10:55:54 - mmengine - INFO - rknn-toolkit2:   None
09/28 10:55:54 - mmengine - INFO - ascend:  None
09/28 10:55:54 - mmengine - INFO - coreml:  None
09/28 10:55:54 - mmengine - INFO - tvm: None
09/28 10:55:54 - mmengine - INFO - vacc:    None
09/28 10:55:54 - mmengine - INFO - 

09/28 10:55:54 - mmengine - INFO - **********Codebase information**********
09/28 10:55:54 - mmengine - INFO - mmdet:   3.1.0
09/28 10:55:54 - mmengine - INFO - mmseg:   None
09/28 10:55:54 - mmengine - INFO - mmpretrain:  None
09/28 10:55:54 - mmengine - INFO - mmocr:   None
09/28 10:55:54 - mmengine - INFO - mmagic:  None
09/28 10:55:54 - mmengine - INFO - mmdet3d: None
09/28 10:55:54 - mmengine - INFO - mmpose:  None
09/28 10:55:54 - mmengine - INFO - mmrotate:    1.0.0rc1
09/28 10:55:54 - mmengine - INFO - mmaction:    None
09/28 10:55:54 - mmengine - INFO - mmrazor: None
09/28 10:55:54 - mmengine - INFO - mmyolo:  None

Error traceback

09/28 09:51:51 - mmengine - WARNING - Failed to search registry with scope "mmrotate" in the "Codebases" registry tree. As a workaround, the current "Codebases" registry in "mmdeploy" is used to build instance. This may cause unexpected failure when running the built modules. Please check whether "mmrotate" is a correct scope, or whether the registry is initialized.
09/28 09:51:51 - mmengine - WARNING - Failed to search registry with scope "mmrotate" in the "mmrotate_tasks" registry tree. As a workaround, the current "mmrotate_tasks" registry in "mmdeploy" is used to build instance. This may cause unexpected failure when running the built modules. Please check whether "mmrotate" is a correct scope, or whether the registry is initialized.
09/28 09:51:52 - mmengine - INFO - Start pipeline mmdeploy.apis.pytorch2onnx.torch2onnx in subprocess
09/28 09:51:52 - mmengine - WARNING - Failed to search registry with scope "mmrotate" in the "Codebases" registry tree. As a workaround, the current "Codebases" registry in "mmdeploy" is used to build instance. This may cause unexpected failure when running the built modules. Please check whether "mmrotate" is a correct scope, or whether the registry is initialized.
09/28 09:51:52 - mmengine - WARNING - Failed to search registry with scope "mmrotate" in the "mmrotate_tasks" registry tree. As a workaround, the current "mmrotate_tasks" registry in "mmdeploy" is used to build instance. This may cause unexpected failure when running the built modules. Please check whether "mmrotate" is a correct scope, or whether the registry is initialized.
Loads checkpoint by local backend from path: s2anet_r50_fpn_fp16_1x_dota_le135-5cac515c.pth
The model and loaded state dict do not match exactly

unexpected key in source state_dict: fam_head.cls_convs.0.conv.weight, fam_head.cls_convs.0.conv.bias, fam_head.cls_convs.1.conv.weight, fam_head.cls_convs.1.conv.bias, fam_head.reg_convs.0.conv.weight, fam_head.reg_convs.0.conv.bias, fam_head.reg_convs.1.conv.weight, fam_head.reg_convs.1.conv.bias, fam_head.retina_cls.weight, fam_head.retina_cls.bias, fam_head.retina_reg.weight, fam_head.retina_reg.bias, align_conv.ac.0.deform_conv.weight, align_conv.ac.1.deform_conv.weight, align_conv.ac.2.deform_conv.weight, align_conv.ac.3.deform_conv.weight, align_conv.ac.4.deform_conv.weight, odm_head.or_conv.weight, odm_head.or_conv.bias, odm_head.or_conv.indices, odm_head.cls_convs.0.conv.weight, odm_head.cls_convs.0.conv.bias, odm_head.cls_convs.1.conv.weight, odm_head.cls_convs.1.conv.bias, odm_head.reg_convs.0.conv.weight, odm_head.reg_convs.0.conv.bias, odm_head.reg_convs.1.conv.weight, odm_head.reg_convs.1.conv.bias, odm_head.odm_cls.weight, odm_head.odm_cls.bias, odm_head.odm_reg.weight, odm_head.odm_reg.bias

missing keys in source state_dict: bbox_head_init.cls_convs.0.conv.weight, bbox_head_init.cls_convs.0.conv.bias, bbox_head_init.cls_convs.1.conv.weight, bbox_head_init.cls_convs.1.conv.bias, bbox_head_init.reg_convs.0.conv.weight, bbox_head_init.reg_convs.0.conv.bias, bbox_head_init.reg_convs.1.conv.weight, bbox_head_init.reg_convs.1.conv.bias, bbox_head_init.retina_cls.weight, bbox_head_init.retina_cls.bias, bbox_head_init.retina_reg.weight, bbox_head_init.retina_reg.bias, bbox_head_refine.0.or_conv.weight, bbox_head_refine.0.or_conv.bias, bbox_head_refine.0.or_conv.indices, bbox_head_refine.0.cls_convs.0.conv.weight, bbox_head_refine.0.cls_convs.0.conv.bias, bbox_head_refine.0.cls_convs.1.conv.weight, bbox_head_refine.0.cls_convs.1.conv.bias, bbox_head_refine.0.reg_convs.0.conv.weight, bbox_head_refine.0.reg_convs.0.conv.bias, bbox_head_refine.0.reg_convs.1.conv.weight, bbox_head_refine.0.reg_convs.1.conv.bias, bbox_head_refine.0.retina_cls.weight, bbox_head_refine.0.retina_cls.bias, bbox_head_refine.0.retina_reg.weight, bbox_head_refine.0.retina_reg.bias, bbox_head_refine.0.feat_refine_module.deform_conv.weight

09/28 09:51:53 - mmengine - WARNING - DeprecationWarning: get_onnx_config will be deprecated in the future. 
09/28 09:51:53 - mmengine - INFO - Export PyTorch model to ONNX: mmdeploy_models/mmrotate/redet/ort/end2end.onnx.
09/28 09:51:53 - mmengine - WARNING - Can not find torch.nn.functional.scaled_dot_product_attention, function rewrite will not be applied
09/28 09:51:53 - mmengine - WARNING - Can not find torch._C._jit_pass_onnx_autograd_function_process, function rewrite will not be applied
/home/server/PycharmProjects/det_screen/mmrotate/mmrotate/models/dense_heads/s2a_head.py:41: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  assert num_imgs == cls_scores[i].size(0) == bbox_preds[i].size(0)
/home/server/anaconda3/envs/openmmlab/lib/python3.8/site-packages/mmdet/models/dense_heads/anchor_head.py:115: UserWarning: DeprecationWarning: anchor_generator is deprecated, please use "prior_generator" instead
  warnings.warn('DeprecationWarning: anchor_generator is deprecated, '
/home/server/anaconda3/envs/openmmlab/lib/python3.8/site-packages/mmdet/structures/bbox/base_boxes.py:62: TracerWarning: torch.as_tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
  data = torch.as_tensor(data)
/home/server/anaconda3/envs/openmmlab/lib/python3.8/site-packages/mmdet/structures/bbox/base_boxes.py:76: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  assert data.dim() >= 2 and data.size(-1) == self.box_dim, \
/home/server/PycharmProjects/det_screen/mmrotate/mmrotate/models/task_modules/coders/delta_xywht_rbbox_coder.py:112: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  assert pred_bboxes.size(0) == bboxes.size(0)
/home/server/PycharmProjects/det_screen/mmrotate/mmrotate/models/task_modules/coders/delta_xywht_rbbox_coder.py:249: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if num_bboxes == 0:
/home/server/anaconda3/envs/openmmlab/lib/python3.8/site-packages/mmcv/ops/deform_conv.py:334: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  input_pad = (x.size(2) < self.kernel_size[0]) or (x.size(3) <
/home/server/anaconda3/envs/openmmlab/lib/python3.8/site-packages/mmcv/ops/deform_conv.py:336: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if input_pad:
/home/server/anaconda3/envs/openmmlab/lib/python3.8/site-packages/mmcv/ops/deform_conv.py:217: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if not all(map(lambda s: s > 0, output_size)):
/home/server/anaconda3/envs/openmmlab/lib/python3.8/site-packages/mmcv/ops/deform_conv.py:113: TracerWarning: Converting a tensor to a Python integer might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  int(i)
/home/server/anaconda3/envs/openmmlab/lib/python3.8/site-packages/mmcv/ops/deform_conv.py:119: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  cur_im2col_step = min(ctx.im2col_step, input.size(0))
/home/server/anaconda3/envs/openmmlab/lib/python3.8/site-packages/mmcv/ops/deform_conv.py:120: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  assert (input.size(0) % cur_im2col_step
/home/server/anaconda3/envs/openmmlab/lib/python3.8/site-packages/mmcv/ops/deform_conv.py:345: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if input_pad:
/home/server/anaconda3/envs/openmmlab/lib/python3.8/site-packages/mmdet/models/dense_heads/base_dense_head.py:363: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  assert cls_score.size()[-2:] == bbox_pred.size()[-2:]
/home/server/anaconda3/envs/openmmlab/lib/python3.8/site-packages/mmdet/models/utils/misc.py:336: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  num_topk = min(topk, valid_idxs.size(0))
/home/server/PycharmProjects/det_screen/mmrotate/mmrotate/models/task_modules/coders/delta_xywht_rbbox_coder.py:120: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if self.use_box_type and decoded_bboxes.size(-1) == 5:
/home/server/anaconda3/envs/openmmlab/lib/python3.8/site-packages/mmdet/models/dense_heads/base_dense_head.py:473: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if not valid_mask.all():
/home/server/anaconda3/envs/openmmlab/lib/python3.8/site-packages/mmcv/ops/nms.py:276: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if boxes.size(-1) == 5:
/home/server/anaconda3/envs/openmmlab/lib/python3.8/site-packages/mmcv/ops/nms.py:286: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
  max_coordinate + torch.tensor(1).to(boxes))
/home/server/anaconda3/envs/openmmlab/lib/python3.8/site-packages/mmcv/ops/nms.py:302: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if boxes_for_nms.shape[0] < split_thr:
/home/server/anaconda3/envs/openmmlab/lib/python3.8/site-packages/mmcv/ops/nms.py:401: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if dets.shape[0] == 0:
09/28 10:08:31 - mmengine - ERROR - /home/server/anaconda3/envs/openmmlab/lib/python3.8/site-packages/mmdeploy/apis/core/pipeline_manager.py - pop_mp_output - 80 - `mmdeploy.apis.pytorch2onnx.torch2onnx` with Call id: 0 failed. exit.
galaxyGGG commented 1 year ago

In addition, I tried to convert a rotated-faster-rcnn model, and succeeded. Thus I suppose the envs are setup correctly.

I also tried to convert a redet model, which failed. The error was:

RuntimeError: riroi_align_rotated_forward_impl: implementation for device cpu not found.

Does this mean I can't convert a CPU version of REDET model?

RunningLeon commented 1 year ago

hi, we do not support s2anet from mmrotate: https://mmdeploy.readthedocs.io/en/latest/04-supported-codebases/mmrotate.html#supported-models

RunningLeon commented 1 year ago

In addition, I tried to convert a rotated-faster-rcnn model, and succeeded. Thus I suppose the envs are setup correctly.

I also tried to convert a redet model, which failed. The error was:

RuntimeError: riroi_align_rotated_forward_impl: implementation for device cpu not found.

Does this mean I can't convert a CPU version of REDET model?

you can convert to onnx with cuda. ONNX is device irrelevant.

galaxyGGG commented 1 year ago

In addition, I tried to convert a rotated-faster-rcnn model, and succeeded. Thus I suppose the envs are setup correctly. I also tried to convert a redet model, which failed. The error was:

RuntimeError: riroi_align_rotated_forward_impl: implementation for device cpu not found.

Does this mean I can't convert a CPU version of REDET model?

you can convert to onnx with cuda. ONNX is device irrelevant.

Sorry for not reading the whole guide, Thx for you patience. Looking forward to seeing the support for redet!