open-mmlab / mmdeploy

OpenMMLab Model Deployment Framework
https://mmdeploy.readthedocs.io/en/latest/
Apache License 2.0
2.79k stars 637 forks source link

[Bug] Error occurs when run convert_rtmdet.py (convert rtmdet pth to onnx) #2424

Closed Haroldhy closed 1 year ago

Haroldhy commented 1 year ago

Checklist

Describe the bug

error:Traceback (most recent call last): File ".\convert_rtmdet.py", line 104, in model = build_model_from_cfg(args.config, args.checkpoint, args.device) File ".\convert_rtmdet.py", line 10, in build_model_from_cfg model = init_detector(config_path, checkpoint_path, device=device) File "E:\anaconda3\envs\mmpose\lib\site-packages\mmdet\apis\inference.py", line 53, in init_detector config = Config.fromfile(config) File "E:\anaconda3\envs\mmpose\lib\site-packages\mmengine\config\config.py", line 456, in fromfile cfg_dict, cfg_text, env_variables = Config._file2dict( File "E:\anaconda3\envs\mmpose\lib\site-packages\mmengine\config\config.py", line 940, in _file2dict raise e File "E:\anaconda3\envs\mmpose\lib\site-packages\mmengine\config\config.py", line 882, in _file2dict _cfg_dict, _cfg_text, _env_variables = Config._file2dict( File "E:\anaconda3\envs\mmpose\lib\site-packages\mmengine\config\config.py", line 838, in _file2dict if lazy_import is None and Config._is_lazy_import(filename): File "E:\anaconda3\envs\mmpose\lib\site-packages\mmengine\config\config.py", line 1651, in _is_lazy_import with open(filename, encoding='utf-8') as f: FileNotFoundError: [Errno 2] No such file or directory: 'E:\hy\algorithm\workspace\mmpose\projects\rtmpose\examples\RTMPose-Deploy\Windows\TensorRT\python\config\../base/base_static.py'

Reproduction

My Code :E:\hy\algorithm\workspace\mmpose\projects\rtmpose\examples\RTMPose-Deploy\Windows\TensorRT\python> python .\convert_rtmdet.py --config .\config\detection_tensorrt_static-320x320.py --checkpoint .\checkpoint\rtmdet_nano_8xb32-100e_coco-obj365-person-05d8511e.pth --output .\rtmdet\

Environment

09/12 17:02:11 - mmengine - INFO - **********Environmental information**********
09/12 17:02:16 - mmengine - INFO - sys.platform: win32
09/12 17:02:16 - mmengine - INFO - Python: 3.8.17 (default, Jul  5 2023, 20:44:21) [MSC v.1916 64 bit (AMD64)]
09/12 17:02:16 - mmengine - INFO - CUDA available: True
09/12 17:02:16 - mmengine - INFO - numpy_random_seed: 2147483648
09/12 17:02:16 - mmengine - INFO - GPU 0: NVIDIA GeForce GTX 1080 Ti
09/12 17:02:16 - mmengine - INFO - GPU 1: NVIDIA GeForce GTX 1070 Ti
09/12 17:02:16 - mmengine - INFO - CUDA_HOME: C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.0
09/12 17:02:16 - mmengine - INFO - NVCC: Cuda compilation tools, release 11.0, V11.0.221
09/12 17:02:16 - mmengine - INFO - MSVC: 用于 x64 的 Microsoft (R) C/C++ 优化编译器 19.36.32537 版
09/12 17:02:16 - mmengine - INFO - GCC: n/a
09/12 17:02:16 - mmengine - INFO - PyTorch: 1.8.1+cu111
09/12 17:02:16 - mmengine - INFO - PyTorch compiling details: PyTorch built with:
  - C++ Version: 199711
  - MSVC 192829913
  - Intel(R) Math Kernel Library Version 2020.0.2 Product Build 20200624 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v1.7.0 (Git Hash 7aed236906b1f7a05c0917e5257a1af05e9ff683)
  - OpenMP 2019
  - CPU capability usage: AVX2
  - CUDA Runtime 11.1
  - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_37,code=compute_37
  - CuDNN 8.0.5
  - Magma 2.5.4
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.1, CUDNN_VERSION=8.0.5, CXX_COMPILER=C:/w/b/windows/tmp_bin/sccache-cl.exe, CXX_FLAGS=/DWIN32 /D_WINDOWS /GR /EHsc /w /bigobj -DUSE_PTHREADPOOL -openmp:experimental -DNDEBUG -DUSE_FBGEMM -DUSE_XNNPACK, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.8.1, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=OFF, USE_NNPACK=OFF, USE_OPENMP=ON,

09/12 17:02:16 - mmengine - INFO - TorchVision: 0.9.1+cu111
09/12 17:02:16 - mmengine - INFO - OpenCV: 4.8.0
09/12 17:02:16 - mmengine - INFO - MMEngine: 0.8.4
09/12 17:02:16 - mmengine - INFO - MMCV: 2.0.1
09/12 17:02:16 - mmengine - INFO - MMCV Compiler: MSVC 192930148
09/12 17:02:16 - mmengine - INFO - MMCV CUDA Compiler: 11.1
09/12 17:02:16 - mmengine - INFO - MMDeploy: 1.2.0+6e60cae
09/12 17:02:16 - mmengine - INFO -

09/12 17:02:16 - mmengine - INFO - **********Backend information**********
09/12 17:02:16 - mmengine - INFO - tensorrt:    8.5.2.2
09/12 17:02:16 - mmengine - INFO - tensorrt custom ops: Available
09/12 17:02:17 - mmengine - INFO - ONNXRuntime: 1.8.1
09/12 17:02:17 - mmengine - INFO - ONNXRuntime-gpu:     None
09/12 17:02:17 - mmengine - INFO - ONNXRuntime custom ops:      Available
09/12 17:02:17 - mmengine - INFO - pplnn:       None
09/12 17:02:17 - mmengine - INFO - ncnn:        None
09/12 17:02:17 - mmengine - INFO - snpe:        None
09/12 17:02:17 - mmengine - INFO - openvino:    None
09/12 17:02:17 - mmengine - INFO - torchscript: 1.8.1+cu111
09/12 17:02:17 - mmengine - INFO - torchscript custom ops:      NotAvailable
09/12 17:02:17 - mmengine - INFO - rknn-toolkit:        None
09/12 17:02:17 - mmengine - INFO - rknn-toolkit2:       None
09/12 17:02:17 - mmengine - INFO - ascend:      None
09/12 17:02:17 - mmengine - INFO - coreml:      None
09/12 17:02:17 - mmengine - INFO - tvm: None
09/12 17:02:17 - mmengine - INFO - vacc:        None
09/12 17:02:17 - mmengine - INFO -

09/12 17:02:17 - mmengine - INFO - **********Codebase information**********
09/12 17:02:17 - mmengine - INFO - mmdet:       3.1.0
09/12 17:02:17 - mmengine - INFO - mmseg:       None
09/12 17:02:17 - mmengine - INFO - mmpretrain:  1.0.2
09/12 17:02:17 - mmengine - INFO - mmocr:       None
09/12 17:02:17 - mmengine - INFO - mmagic:      None
09/12 17:02:17 - mmengine - INFO - mmdet3d:     None
09/12 17:02:17 - mmengine - INFO - mmpose:      1.1.0
09/12 17:02:17 - mmengine - INFO - mmrotate:    None
09/12 17:02:17 - mmengine - INFO - mmaction:    None
09/12 17:02:17 - mmengine - INFO - mmrazor:     None
09/12 17:02:17 - mmengine - INFO - mmyolo:      None

Error traceback

No response

irexyc commented 1 year ago

detection_tensorrt_static-320x320.py 这个文件是你从 mmdeploy/configs 文件夹拷贝到的吧。不建议你拷贝,因为他依赖一些其他的 config,没拷贝全的话会破坏目录结构

https://mmdetection.readthedocs.io/zh_CN/latest/user_guides/config.html#id9

https://github.com/open-mmlab/mmdeploy/blob/main/configs/mmdet/detection/detection_tensorrt_static-320x320.py#L1

Haroldhy commented 1 year ago

感谢回复!但我遇到了一个新的报错。 我把convert_rtmdet.py放到了mmdeploy文件夹目录下,并且运行python .\convert_rtmdet.py --config .\configs\mmdet\detection\detection_onnxruntime_dynamic.py --checkpoint .\weights\rtmdet_nano_8xb32-100e_coco-obj365-person-05d8511e.pth --output ..\mmpose\projects\rtmpose\examples\RTMPose-Deploy\Windows\TensorRT\python\rtmdet\,得到报错: Traceback (most recent call last): File "E:\anaconda3\envs\mmpose\lib\site-packages\mmengine\config\config.py", line 106, in getattr value = super().getattr(name) File "E:\anaconda3\envs\mmpose\lib\site-packages\addict\addict.py", line 67, in getattr return self.getitem(item) File "E:\anaconda3\envs\mmpose\lib\site-packages\mmengine\config\config.py", line 135, in getitem return self.build_lazy(super().getitem(key)) File "E:\anaconda3\envs\mmpose\lib\site-packages\mmengine\config\config.py", line 102, in missing raise KeyError(name) KeyError: 'model'

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File ".\convert_rtmdet.py", line 104, in model = build_model_from_cfg(args.config, args.checkpoint, args.device) File ".\convert_rtmdet.py", line 10, in build_model_from_cfg model = init_detector(config_path, checkpoint_path, device=device) File "E:\anaconda3\envs\mmpose\lib\site-packages\mmdet\apis\inference.py", line 59, in init_detector elif 'init_cfg' in config.model.backbone: File "E:\anaconda3\envs\mmpose\lib\site-packages\mmengine\config\config.py", line 1489, in getattr return getattr(self._cfg_dict, name) File "E:\anaconda3\envs\mmpose\lib\site-packages\mmengine\config\config.py", line 110, in getattr raise AttributeError(f"'{self.class.name}' object has no " AttributeError: 'ConfigDict' object has no attribute 'model'

irexyc commented 1 year ago

我看了一下,这个项目是脱离mmdeploy使用的。

https://github.com/Dominic23331/rtmpose_tensorrt

model cfg 指的应该是 pytorch 那边的 config

Haroldhy commented 1 year ago

不好意思,我不大能理解“pytorch 那边的 config”是什么意思?能不能帮我举个例子呀?谢谢!

irexyc commented 1 year ago

不好意思,我不大能理解“pytorch 那边的 config”是什么意思?能不能帮我举个例子呀?谢谢!

就是说这个模型(ckpt)是用哪个config训练出来的。

比如这里 https://github.com/open-mmlab/mmpose/blob/main/configs/body_2d_keypoint/rtmpose/coco/rtmpose_coco.md

image
Haroldhy commented 1 year ago

你说的没错!十分感谢!我已经将cfg修改为了mmpose\projects\rtmpose\rtmdet\person/rtmdet_nano_320-8xb32_coco-person.py。但是我运行的时候还是出现了新的bug,想再请教下一下!报错信息如下所示: (mmpose) PS E:\hy\algorithm\workspace\mmdeploy> python .\convert_rtmdet.py --config ..\mmpose\projects\rtmpose\rtmdet\person\rtmdet_nano_320-8xb32_coco-person.py --checkpoint .\weights\rtmdet_nano_8xb32-100e_coco-obj365-person-05d8511e.pth --output ..\mmpose\projects\rtmpose\examples\RTMPose-Deploy\Windows\TensorRT\python\rtmdet\ Loads checkpoint by local backend from path: .\weights\rtmdet_nano_8xb32-100e_coco-obj365-person-05d8511e.pth E:\anaconda3\envs\mmpose\lib\site-packages\torch\nn\functional.py:1709: UserWarning: nn.functional.sigmoid is deprecated. Use torch.sigmoid instead. warnings.warn("nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.") Traceback (most recent call last): File ".\convert_rtmdet.py", line 109, in torch.onnx.export( File "E:\anaconda3\envs\mmpose\lib\site-packages\torch\onnx__init__.py", line 271, in export return utils.export(model, args, f, export_params, verbose, training, File "E:\anaconda3\envs\mmpose\lib\site-packages\torch\onnx\utils.py", line 88, in export _export(model, args, f, export_params, verbose, training, input_names, output_names, File "E:\anaconda3\envs\mmpose\lib\site-packages\torch\onnx\utils.py", line 694, in _export _model_to_graph(model, args, verbose, input_names, File "E:\anaconda3\envs\mmpose\lib\site-packages\torch\onnx\utils.py", line 463, in _model_to_graph graph = _optimize_graph(graph, operator_export_type, File "E:\anaconda3\envs\mmpose\lib\site-packages\torch\onnx\utils.py", line 206, in _optimize_graph graph = torch._C._jit_pass_onnx(graph, operator_export_type) File "E:\anaconda3\envs\mmpose\lib\site-packages\torch\onnx__init__.py", line 309, in _run_symbolic_function return utils._run_symbolic_function(*args, **kwargs) File "E:\anaconda3\envs\mmpose\lib\site-packages\torch\onnx\utils.py", line 993, in _run_symbolic_function symbolic_fn = _find_symbolic_in_registry(domain, op_name, opset_version, operator_export_type) File "E:\anaconda3\envs\mmpose\lib\site-packages\torch\onnx\utils.py", line 950, in _find_symbolic_in_registry return sym_registry.get_registered_op(op_name, domain, opset_version) File "E:\anaconda3\envs\mmpose\lib\site-packages\torch\onnx\symbolic_registry.py", line 116, in get_registered_op raise RuntimeError(msg) RuntimeError: Exporting the operator hardsigmoid to ONNX opset version 11 is not supported. Please feel free to request support or submit a pull request on PyTorch GitHub.

但是我并没有在config文件中看到hardsigmoid是在哪里定义的,想请教一下我在哪里找到定义这层的定义呢?或者说您有什么解决方法的建议吗?

irexyc commented 1 year ago

可能是pytorch或者mmdet版本跟他们用的不一样,建议去 https://github.com/open-mmlab/mmpose/issues 提个issue问下吧

也可以尝试按照这个修改一下: https://github.com/pytorch/vision/issues/3463 https://blog.csdn.net/jacke121/article/details/125358683

Haroldhy commented 1 year ago

我觉得应该就是你列出来的两个文章里的问题,但是我没有找到这个hardsigmoid层是在哪里定义的。无论如何非常感谢!

irexyc commented 1 year ago

你如果熟悉mm系列的话,应该知道模型是根据config来创建的,那么你看config中各个模型是不是用到了这个函数。

简单搜了一下mmdet是有两处,channel_attention应该是一处,另一处你可以看看用没用到。

image
Haroldhy commented 1 year ago

谢谢,我通过直接修改torch的nn.Hardsigmoid()解决了!然后我又遇到了和之前很类似的错误,具体信息如下: (mmpose) PS E:\hy\algorithm\workspace\mmdeploy> python .\convert_rtmdet.py --config ..\mmpose\projects\rtmpose\rtmdet\person\rtmdet_nano_320-8xb32_coco-person.py --checkpoint .\weights\rtmdet_nano_8xb32-100e_coco-obj365-person-05d8511e.pth --output ..\mmpose\projects\rtmpose\examples\RTMPose-Deploy\Windows\TensorRT\python\rtmdet\ Loads checkpoint by local backend from path: .\weights\rtmdet_nano_8xb32-100e_coco-obj365-person-05d8511e.pth E:\anaconda3\envs\mmpose\lib\site-packages\torch\nn\functional.py:1709: UserWarning: nn.functional.sigmoid is deprecated. Use torch.sigmoid instead. warnings.warn("nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.") Traceback (most recent call last): File ".\convert_rtmdet.py", line 108, in torch.onnx.export( File "E:\anaconda3\envs\mmpose\lib\site-packages\torch\onnx__init__.py", line 271, in export return utils.export(model, args, f, export_params, verbose, training, File "E:\anaconda3\envs\mmpose\lib\site-packages\torch\onnx\utils.py", line 88, in export _export(model, args, f, export_params, verbose, training, input_names, output_names, File "E:\anaconda3\envs\mmpose\lib\site-packages\torch\onnx\utils.py", line 728, in _export with torch.serialization._open_file_like(f, 'wb') as opened_file: File "E:\anaconda3\envs\mmpose\lib\site-packages\torch\serialization.py", line 230, in _open_file_like return _open_file(name_or_buffer, mode) File "E:\anaconda3\envs\mmpose\lib\site-packages\torch\serialization.py", line 211, in init super(_open_file, self).init(open(name, mode)) OSError: [Errno 22] Invalid argument: '..\mmpose\projects\rtmpose\examples\RTMPose-Deploy\Windows\TensorRT\python\rtmdet\' 我确定我没有挪动过任何config文件,是否是因为我将这个convert的python文件挪了路径呢?

irexyc commented 1 year ago

https://github.com/Dominic23331/rtmpose_tensorrt/blob/master/rtmpose_tensorrt/python/convert_rtmdet.py#L112

output 应该是个文件路径吧,你给的文件夹的。

我觉得这个你完全可以自己debug解决的。报错很明显了。

File ".\convert_rtmdet.py", line 108, in
torch.onnx.export(
Haroldhy commented 1 year ago

不好意思,我刚刚自己检查也发现了。已经解决好啦