open-mmlab / mmdeploy

OpenMMLab Model Deployment Framework
https://mmdeploy.readthedocs.io/en/latest/
Apache License 2.0
2.74k stars 627 forks source link

[Bug] 尝试进行瑞芯微RK3588模型转换出错 #2740

Closed vicnoah closed 5 months ago

vicnoah commented 5 months ago

Checklist

Describe the bug

尝试进行示例中的瑞芯微模型转换在环境安装好后报错。 示例链接: 瑞芯微NPU部署

Reproduction

执行命令:

cd mmdeploy &&
python3 tools/deploy.py \
    configs/mmpretrain/classification_rknn-fp16_static-224x224.py \
    ../mmpretrain/configs/resnet/resnet50_8xb32_in1k.py \
    https://download.openmmlab.com/mmclassification/v0/resnet/resnet50_batch256_imagenet_20200708-cfb998bf.pth \
    ../mmpretrain/demo/demo.JPEG \
    --work-dir mmdeploy_models/mmpretrain/resnet50 \
    --device cpu \
    --dump-info

修改了配置文件:

configs/_base_/backends/rknn.py

backend_config = dict(
    type='rknn',
    common_config=dict(
        mean_values=None,
        std_values=None,
        target_platform='rk3588',  # 'rk3588'
        optimization_level=3),
    quantization_config=dict(
        do_quantization=True,
        dataset=None,
        pre_compile=False,
        rknn_batch_size=1),
    input_size_list=[[3, 224, 224]],
    )

Environment

root@b93bb3568fd5:~/workspace/mmdeploy# python3 tools/check_env.py 
04/19 06:38:43 - mmengine - INFO - 

04/19 06:38:43 - mmengine - INFO - **********Environmental information**********
04/19 06:38:44 - mmengine - INFO - sys.platform: linux
04/19 06:38:44 - mmengine - INFO - Python: 3.8.10 (default, May 26 2023, 14:05:08) [GCC 9.4.0]
04/19 06:38:44 - mmengine - INFO - CUDA available: False
04/19 06:38:44 - mmengine - INFO - MUSA available: False
04/19 06:38:44 - mmengine - INFO - numpy_random_seed: 2147483648
04/19 06:38:44 - mmengine - INFO - GCC: x86_64-linux-gnu-gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
04/19 06:38:44 - mmengine - INFO - PyTorch: 2.0.0+cu118
04/19 06:38:44 - mmengine - INFO - PyTorch compiling details: PyTorch built with:
  - GCC 9.3
  - C++ Version: 201703
  - Intel(R) oneAPI Math Kernel Library Version 2022.2-Product Build 20220804 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v2.7.3 (Git Hash 6dbeffbae1f23cbbeae17adb7b5b13f1f37c080e)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - LAPACK is enabled (usually provided by MKL)
  - NNPACK is enabled
  - CPU capability usage: AVX2
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.8, CUDNN_VERSION=8.7.0, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=0 -fabi-version=11 -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOROCTRACER -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wunused-local-typedefs -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_DISABLE_GPU_ASSERTS=ON, TORCH_VERSION=2.0.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=1, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, 

04/19 06:38:44 - mmengine - INFO - TorchVision: 0.15.0+cu118
04/19 06:38:44 - mmengine - INFO - OpenCV: 4.5.4
04/19 06:38:44 - mmengine - INFO - MMEngine: 0.10.3
04/19 06:38:44 - mmengine - INFO - MMCV: 2.1.0
04/19 06:38:44 - mmengine - INFO - MMCV Compiler: GCC 9.3
04/19 06:38:44 - mmengine - INFO - MMCV CUDA Compiler: 11.8
04/19 06:38:44 - mmengine - INFO - MMDeploy: 1.3.1+bc75c9d
04/19 06:38:44 - mmengine - INFO - 

04/19 06:38:44 - mmengine - INFO - **********Backend information**********
04/19 06:38:44 - mmengine - INFO - tensorrt:    8.6.1
04/19 06:38:44 - mmengine - INFO - tensorrt custom ops: Available
04/19 06:38:44 - mmengine - INFO - ONNXRuntime: 1.15.1
04/19 06:38:44 - mmengine - INFO - ONNXRuntime-gpu:     1.15.1
04/19 06:38:44 - mmengine - INFO - ONNXRuntime custom ops:      Available
04/19 06:38:44 - mmengine - INFO - pplnn:       0.8.1
04/19 06:38:44 - mmengine - INFO - ncnn:        1.0.20230905
04/19 06:38:44 - mmengine - INFO - ncnn custom ops:     Available
04/19 06:38:44 - mmengine - INFO - snpe:        None
04/19 06:38:44 - mmengine - INFO - openvino:    2023.0.2
04/19 06:38:44 - mmengine - INFO - torchscript: 2.0.0+cu118
04/19 06:38:44 - mmengine - INFO - torchscript custom ops:      Available
04/19 06:38:44 - mmengine - INFO - rknn-toolkit:        None
04/19 06:38:44 - mmengine - INFO - rknn-toolkit2:       1.6.0+81f21f4d
04/19 06:38:44 - mmengine - INFO - ascend:      None
04/19 06:38:44 - mmengine - INFO - coreml:      None
04/19 06:38:44 - mmengine - INFO - tvm: None
04/19 06:38:44 - mmengine - INFO - vacc:        None
04/19 06:38:44 - mmengine - INFO - 

04/19 06:38:44 - mmengine - INFO - **********Codebase information**********
04/19 06:38:44 - mmengine - INFO - mmdet:       3.3.0
04/19 06:38:44 - mmengine - INFO - mmseg:       None
04/19 06:38:44 - mmengine - INFO - mmpretrain:  1.2.0
04/19 06:38:44 - mmengine - INFO - mmocr:       None
04/19 06:38:44 - mmengine - INFO - mmagic:      None
04/19 06:38:44 - mmengine - INFO - mmdet3d:     None
04/19 06:38:44 - mmengine - INFO - mmpose:      1.3.1
04/19 06:38:44 - mmengine - INFO - mmrotate:    None
04/19 06:38:44 - mmengine - INFO - mmaction:    None
04/19 06:38:44 - mmengine - INFO - mmrazor:     None
04/19 06:38:44 - mmengine - INFO - mmyolo:      None

Error traceback

04/19 06:52:12 - mmengine - INFO - Export PyTorch model to ONNX: mmdeploy_models/mmpretrain/resnet50/end2end.onnx.
04/19 06:52:13 - mmengine - INFO - Execute onnx optimize passes.
============= Diagnostic Run torch.onnx.export version 2.0.0+cu118 =============
verbose: False, log level: Level.ERROR
======================= 0 NONE 0 NOTE 0 WARNING 0 ERROR ========================

04/19 06:52:14 - mmengine - INFO - Finish pipeline mmdeploy.apis.pytorch2onnx.torch2onnx
04/19 06:52:15 - mmengine - INFO - Start pipeline mmdeploy.apis.utils.utils.to_backend in main process
04/19 06:52:15 - mmengine - WARNING - DeprecationWarning: get_onnx_config will be deprecated in the future. 
W __init__: rknn-toolkit2 version: 1.6.0+81f21f4d
W load_onnx: It is recommended onnx opset 19, but your onnx model opset is 11!
W load_onnx: Model converted from pytorch, 'opset_version' should be set 19 in torch.onnx.export for successful convert!
Loading : 100%|█████████████████████████████████████████████████| 108/108 [00:00<00:00, 2409.39it/s]
W load_onnx: The config.mean_values is None, zeros will be set for input 0!
W load_onnx: The config.std_values is None, ones will be set for input 0!
W build: The dataset='/tmp/tmp6m0kh14o.txt' is ignored because do_quantization = False!
I base_optimize ...
I base_optimize done.
I 
I fold_constant ...
I fold_constant done.
I 
I correct_ops ...
I correct_ops done.
I 
I fuse_ops ...
I fuse_ops results:
I     convert_flatten_to_reshape: remove node = ['/neck/Flatten'], add node = ['/neck/Flatten_2reshape']
I     convert_global_avgpool_to_conv: remove node = ['/neck/gap/GlobalAveragePool'], add node = ['/neck/gap/GlobalAveragePool_output_0']
I     convert_gemm_by_conv: remove node = ['/head/fc/Gemm'], add node = ['/head/fc/Gemm_2conv_reshape1', '/head/fc/Gemm_2conv', '/head/fc/Gemm_2conv_reshape2']
I     convert_softmax_to_exsoftmax13: remove node = ['/Softmax'], add node = ['/Softmax']
I     fuse_two_reshape: remove node = ['/neck/Flatten_2reshape']
I     unsqueeze_to_4d_exsoftmax13: remove node = [], add node = ['/Softmax_0_unsqueeze0', '/Softmax_0_unsqueeze1']
I     remove_invalid_reshape: remove node = ['/head/fc/Gemm_2conv_reshape1']
I     fuse_two_reshape: remove node = ['/head/fc/Gemm_2conv_reshape2']
I     remove_invalid_reshape: remove node = ['/Softmax_0_unsqueeze0']
I     fold_constant ...
E build: Catch exception when building RKNN model!
E build: Traceback (most recent call last):
E build:   File "rknn/api/rknn_base.py", line 1987, in rknn.api.rknn_base.RKNNBase.build
E build:   File "rknn/api/graph_optimizer.py", line 1849, in rknn.api.graph_optimizer.GraphOptimizer.fuse_ops
E build:   File "rknn/api/graph_optimizer.py", line 824, in rknn.api.graph_optimizer.GraphOptimizer.fold_constant
E build:   File "rknn/api/session.py", line 34, in rknn.api.session.Session.__init__
E build:   File "rknn/api/session.py", line 130, in rknn.api.session.Session.sess_build
E build:   File "/usr/local/lib/python3.8/dist-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 383, in __init__
E build:     self._create_inference_session(providers, provider_options, disabled_optimizers)
E build:   File "/usr/local/lib/python3.8/dist-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 435, in _create_inference_session
E build:     sess.initialize_session(providers, provider_options, disabled_optimizers)
E build: onnxruntime.capi.onnxruntime_pybind11_state.NotImplemented: [ONNXRuntimeError] : 9 : NOT_IMPLEMENTED : Could not find an implementation for Reshape(19) node with name '/Softmax_0_unsqueeze1'
W If you can't handle this error, please try updating to the latest version of the toolkit2 and runtime from:
  https://console.zbox.filez.com/l/I00fc3 (Pwd: rknn)  Path: RKNPU2_SDK / 1.X.X / develop /
  If the error still exists in the latest version, please collect the corresponding error logs and the model,
  convert script, and input data that can reproduce the problem, and then submit an issue on:
  https://redmine.rock-chips.com (Please consult our sales or FAE for the redmine account)
04/19 06:52:18 - mmengine - ERROR - /root/workspace/mmdeploy/mmdeploy/backend/rknn/onnx2rknn.py - onnx2rknn - 99 - Build model failed!
anzisheng commented 5 months ago

I met a similar error: --> Building model D fold_constant ... E build: Catch exception when building RKNN model! E build: Traceback (most recent call last): E build: File "rknn/api/rknn_base.py", line 1977, in rknn.api.rknn_base.RKNNBase.build E build: File "rknn/api/graph_optimizer.py", line 859, in rknn.api.graph_optimizer.GraphOptimizer.fold_constant E build: File "rknn/api/session.py", line 34, in rknn.api.session.Session.init E build: File "rknn/api/session.py", line 130, in rknn.api.session.Session.sess_build E build: File "/home/an/miniconda3/envs/toolkit2_1.6/lib/python3.8/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 419, in init E build: self._create_inference_session(providers, provider_options, disabled_optimizers) E build: File "/home/an/miniconda3/envs/toolkit2_1.6/lib/python3.8/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 462, in _create_inference_session E build: sess = C.InferenceSession(session_options, self._model_bytes, False, self._read_config_from_model) E build: onnxruntime.capi.onnxruntime_pybind11_state.InvalidGraph: [ONNXRuntimeError] : 10 : INVALID_GRAPH : This is an invalid model. In Node, ("/Mark", Mark, "mmdeploy", -1) : ("/Concat_12_output_0": tensor(float),) -> ("/Mark_output_0": tensor(float),) , Error No opset import for domain 'mmdeploy' W If you can't handle this error, please try updating to the latest version of the toolkit2 and runtime from: https://console.zbox.filez.com/l/I00fc3 (Pwd: rknn) Path: RKNPU2_SDK / 2.X.X / develop / If the error still exists in the latest version, please collect the corresponding error logs and the model, convert script, and input data that can reproduce the problem, and then submit an issue on: https://redmine.rock-chips.com (Please consult our sales or FAE for the redmine account) Build model failed。 Anybody can solve the question?