open-mmlab / mmdeploy

OpenMMLab Model Deployment Framework
https://mmdeploy.readthedocs.io/en/latest/
Apache License 2.0
2.75k stars 630 forks source link

[Bug] RuntimeError: Exporting the operator einsum to ONNX opset version 11 is not supported. Support for this operator was added in version 12, try exporting with this version. #1624

Closed soltkreig closed 1 year ago

soltkreig commented 1 year ago

Checklist

Describe the bug

Hi, I use dev-1.x branch. I'm trying to convert model but got the error below:

RuntimeError: Exporting the operator einsum to ONNX opset version 11 is not supported. Support for this operator was added in version 12, try exporting with this version.

I found that I can change opset version from 11 to 12 here: mmdeploy/mmdeploy/apis/onnx/export.py but I didn't help

Reproduction

python tools/deploy.py configs/mmaction/video-recognition/video-recognition_onnxruntime_static.py  mmaction2/work_dirs/mvit-small-p244_64x1x1_kinetics400-rgb/mvit-small-p244_64x1x1_kinetics400-rgb.py mmaction2/work_dirs/mvit-small-p244_64x1x1_kinetics400-rgb/best_acc/top1_epoch_180.pth mmaction2/demo/demo.mp4 --work-dir mmdeploy_model/mvit64f --dump-info

Environment

No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda'
01/09 13:10:02 - mmengine - INFO - 

01/09 13:10:02 - mmengine - INFO - **********Environmental information**********
01/09 13:10:03 - mmengine - INFO - sys.platform: linux
01/09 13:10:03 - mmengine - INFO - Python: 3.9.15 | packaged by conda-forge | (main, Nov 22 2022, 15:55:03) [GCC 10.4.0]
01/09 13:10:03 - mmengine - INFO - CUDA available: False
01/09 13:10:03 - mmengine - INFO - numpy_random_seed: 2147483648
01/09 13:10:03 - mmengine - INFO - GCC: gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
01/09 13:10:03 - mmengine - INFO - PyTorch: 1.10.1+cu111
01/09 13:10:03 - mmengine - INFO - PyTorch compiling details: PyTorch built with:
  - GCC 7.3
  - C++ Version: 201402
  - Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v2.2.3 (Git Hash 7336ca9f055cf1bfa13efb658fe15dc9b41f0740)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - LAPACK is enabled (usually provided by MKL)
  - NNPACK is enabled
  - CPU capability usage: AVX512
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.1, CUDNN_VERSION=8.0.5, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.10.1, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, 

01/09 13:10:03 - mmengine - INFO - TorchVision: 0.11.2+cu111
01/09 13:10:03 - mmengine - INFO - OpenCV: 4.7.0
01/09 13:10:03 - mmengine - INFO - MMEngine: 0.4.0
01/09 13:10:03 - mmengine - INFO - MMCV: 2.0.0rc3
01/09 13:10:03 - mmengine - INFO - MMCV Compiler: GCC 7.3
01/09 13:10:03 - mmengine - INFO - MMCV CUDA Compiler: 11.1
01/09 13:10:03 - mmengine - INFO - MMDeploy: 1.0.0rc1+71fc8e3
01/09 13:10:03 - mmengine - INFO - 

01/09 13:10:03 - mmengine - INFO - **********Backend information**********
01/09 13:10:03 - mmengine - INFO - tensorrt:    None
01/09 13:10:03 - mmengine - INFO - ONNXRuntime: 1.13.1
01/09 13:10:03 - mmengine - INFO - ONNXRuntime-gpu:     None
01/09 13:10:03 - mmengine - INFO - ONNXRuntime custom ops:      NotAvailable
01/09 13:10:03 - mmengine - INFO - pplnn:       None
01/09 13:10:03 - mmengine - INFO - ncnn:        None
01/09 13:10:03 - mmengine - INFO - snpe:        None
01/09 13:10:03 - mmengine - INFO - openvino:    None
01/09 13:10:03 - mmengine - INFO - torchscript: 1.10.1+cu111
01/09 13:10:03 - mmengine - INFO - torchscript custom ops:      NotAvailable
01/09 13:10:03 - mmengine - INFO - rknn-toolkit:        None
01/09 13:10:03 - mmengine - INFO - rknn-toolkit2:       None
01/09 13:10:03 - mmengine - INFO - ascend:      None
01/09 13:10:03 - mmengine - INFO - coreml:      None
01/09 13:10:03 - mmengine - INFO - tvm: None
01/09 13:10:03 - mmengine - INFO - 

01/09 13:10:03 - mmengine - INFO - **********Codebase information**********
01/09 13:10:03 - mmengine - INFO - mmdet:       None
01/09 13:10:03 - mmengine - INFO - mmseg:       None
01/09 13:10:03 - mmengine - INFO - mmcls:       None
01/09 13:10:03 - mmengine - INFO - mmocr:       None
01/09 13:10:03 - mmengine - INFO - mmedit:      None
01/09 13:10:03 - mmengine - INFO - mmdet3d:     None
01/09 13:10:03 - mmengine - INFO - mmpose:      None
01/09 13:10:03 - mmengine - INFO - mmrotate:    None
01/09 13:10:03 - mmengine - INFO - mmaction:    1.0.0rc1

Error traceback

Traceback (most recent call last):
  File "/home/user/conda/envs/mmaction2dev/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/home/user/conda/envs/mmaction2dev/lib/python3.9/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/jovyan/people/Murtazin/mmdeploy/mmdeploy/apis/core/pipeline_manager.py", line 107, in __call__
    ret = func(*args, **kwargs)
  File "/home/jovyan/people/Murtazin/mmdeploy/mmdeploy/apis/pytorch2onnx.py", line 98, in torch2onnx
    export(
  File "/home/jovyan/people/Murtazin/mmdeploy/mmdeploy/apis/core/pipeline_manager.py", line 356, in _wrap
    return self.call_function(func_name_, *args, **kwargs)
  File "/home/jovyan/people/Murtazin/mmdeploy/mmdeploy/apis/core/pipeline_manager.py", line 326, in call_function
    return self.call_function_local(func_name, *args, **kwargs)
  File "/home/jovyan/people/Murtazin/mmdeploy/mmdeploy/apis/core/pipeline_manager.py", line 275, in call_function_local
    return pipe_caller(*args, **kwargs)
  File "/home/jovyan/people/Murtazin/mmdeploy/mmdeploy/apis/core/pipeline_manager.py", line 107, in __call__
    ret = func(*args, **kwargs)
  File "/home/jovyan/people/Murtazin/mmdeploy/mmdeploy/apis/onnx/export.py", line 131, in export
    torch.onnx.export(
  File "/home/user/conda/envs/mmaction2dev/lib/python3.9/site-packages/torch/onnx/__init__.py", line 316, in export
    return utils.export(model, args, f, export_params, verbose, training,
  File "/home/user/conda/envs/mmaction2dev/lib/python3.9/site-packages/torch/onnx/utils.py", line 107, in export
    _export(model, args, f, export_params, verbose, training, input_names, output_names,
  File "/home/user/conda/envs/mmaction2dev/lib/python3.9/site-packages/torch/onnx/utils.py", line 724, in _export
    _model_to_graph(model, args, verbose, input_names,
  File "/home/jovyan/people/Murtazin/mmdeploy/mmdeploy/apis/onnx/optimizer.py", line 11, in model_to_graph__custom_optimizer
    graph, params_dict, torch_out = ctx.origin_func(*args, **kwargs)
  File "/home/user/conda/envs/mmaction2dev/lib/python3.9/site-packages/torch/onnx/utils.py", line 497, in _model_to_graph
    graph = _optimize_graph(graph, operator_export_type,
  File "/home/user/conda/envs/mmaction2dev/lib/python3.9/site-packages/torch/onnx/utils.py", line 216, in _optimize_graph
    graph = torch._C._jit_pass_onnx(graph, operator_export_type)
  File "/home/user/conda/envs/mmaction2dev/lib/python3.9/site-packages/torch/onnx/__init__.py", line 373, in _run_symbolic_function
    return utils._run_symbolic_function(*args, **kwargs)
  File "/home/user/conda/envs/mmaction2dev/lib/python3.9/site-packages/torch/onnx/utils.py", line 1028, in _run_symbolic_function
    symbolic_fn = _find_symbolic_in_registry(domain, op_name, opset_version, operator_export_type)
  File "/home/user/conda/envs/mmaction2dev/lib/python3.9/site-packages/torch/onnx/utils.py", line 982, in _find_symbolic_in_registry
    return sym_registry.get_registered_op(op_name, domain, opset_version)
  File "/home/user/conda/envs/mmaction2dev/lib/python3.9/site-packages/torch/onnx/symbolic_registry.py", line 125, in get_registered_op
    raise RuntimeError(msg)
RuntimeError: Exporting the operator einsum to ONNX opset version 11 is not supported. Support for this operator was added in version 12, try exporting with this version.
01/09 13:00:55 - mmengine - ERROR - /home/jovyan/people/Murtazin/mmdeploy/mmdeploy/apis/core/pipeline_manager.py - pop_mp_output - 80 - `mmdeploy.apis.pytorch2onnx.torch2onnx` with Call id: 0 failed. exit.
soltkreig commented 1 year ago

UPD: set opset_version=12 in deploy config and I got the other error: IndexError: index_select(): Index is supposed to be a vector

frame #1: at::native::index_select_out_cpu_(at::Tensor const&, long, at::Tensor const&, at::Tensor&) + 0x3a9 (0x7f4bb95e0739 in /home/user/conda/envs/mmaction2dev/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)
frame #2: at::native::index_select_cpu_(at::Tensor const&, long, at::Tensor const&) + 0xe6 (0x7f4bb95e26f6 in /home/user/conda/envs/mmaction2dev/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)
frame #3: <unknown function> + 0x1d3a4c2 (0x7f4bb9cda4c2 in /home/user/conda/envs/mmaction2dev/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)
frame #4: at::_ops::index_select::redispatch(c10::DispatchKeySet, at::Tensor const&, long, at::Tensor const&) + 0xb9 (0x7f4bb9875649 in /home/user/conda/envs/mmaction2dev/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)
frame #5: <unknown function> + 0x3253be3 (0x7f4bbb1f3be3 in /home/user/conda/envs/mmaction2dev/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)
frame #6: <unknown function> + 0x3254215 (0x7f4bbb1f4215 in /home/user/conda/envs/mmaction2dev/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)
frame #7: at::_ops::index_select::call(at::Tensor const&, long, at::Tensor const&) + 0x166 (0x7f4bb98f5296 in /home/user/conda/envs/mmaction2dev/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)
frame #8: torch::jit::onnx_constant_fold::runTorchBackendForOnnx(torch::jit::Node const*, std::vector<at::Tensor, std::allocator<at::Tensor> >&, int) + 0x1b5f (0x7f4c62fa41cf in /home/user/conda/envs/mmaction2dev/lib/python3.9/site-packages/torch/lib/libtorch_python.so)
frame #9: <unknown function> + 0xbbd6f2 (0x7f4c62feb6f2 in /home/user/conda/envs/mmaction2dev/lib/python3.9/site-packages/torch/lib/libtorch_python.so)
frame #10: torch::jit::ONNXShapeTypeInference(torch::jit::Node*, std::map<std::string, c10::IValue, std::less<std::string>, std::allocator<std::pair<std::string const, c10::IValue> > > const&, int) + 0xa8e (0x7f4c62ff0f3e in /home/user/conda/envs/mmaction2dev/lib/python3.9/site-packages/torch/lib/libtorch_python.so)
frame #11: <unknown function> + 0xbc4a44 (0x7f4c62ff2a44 in /home/user/conda/envs/mmaction2dev/lib/python3.9/site-packages/torch/lib/libtorch_python.so)
frame #12: <unknown function> + 0xb35200 (0x7f4c62f63200 in /home/user/conda/envs/mmaction2dev/lib/python3.9/site-packages/torch/lib/libtorch_python.so)
frame #13: <unknown function> + 0x2a585b (0x7f4c626d385b in /home/user/conda/envs/mmaction2dev/lib/python3.9/site-packages/torch/lib/libtorch_python.so)
<omitting python frames>
 (function ComputeConstantFolding)
Traceback (most recent call last):
  File "/home/jovyan/people/Murtazin/mmdeploy/tools/torch2onnx.py", line 85, in <module>
    main()
  File "/home/jovyan/people/Murtazin/mmdeploy/tools/torch2onnx.py", line 47, in main
    torch2onnx(
  File "/home/jovyan/people/Murtazin/mmdeploy/mmdeploy/apis/core/pipeline_manager.py", line 356, in _wrap
    return self.call_function(func_name_, *args, **kwargs)
  File "/home/jovyan/people/Murtazin/mmdeploy/mmdeploy/apis/core/pipeline_manager.py", line 326, in call_function
    return self.call_function_local(func_name, *args, **kwargs)
  File "/home/jovyan/people/Murtazin/mmdeploy/mmdeploy/apis/core/pipeline_manager.py", line 275, in call_function_local
    return pipe_caller(*args, **kwargs)
  File "/home/jovyan/people/Murtazin/mmdeploy/mmdeploy/apis/core/pipeline_manager.py", line 107, in __call__
    ret = func(*args, **kwargs)
  File "/home/jovyan/people/Murtazin/mmdeploy/mmdeploy/apis/pytorch2onnx.py", line 98, in torch2onnx
    export(
  File "/home/jovyan/people/Murtazin/mmdeploy/mmdeploy/apis/core/pipeline_manager.py", line 356, in _wrap
    return self.call_function(func_name_, *args, **kwargs)
  File "/home/jovyan/people/Murtazin/mmdeploy/mmdeploy/apis/core/pipeline_manager.py", line 326, in call_function
    return self.call_function_local(func_name, *args, **kwargs)
  File "/home/jovyan/people/Murtazin/mmdeploy/mmdeploy/apis/core/pipeline_manager.py", line 275, in call_function_local
    return pipe_caller(*args, **kwargs)
  File "/home/jovyan/people/Murtazin/mmdeploy/mmdeploy/apis/core/pipeline_manager.py", line 107, in __call__
    ret = func(*args, **kwargs)
  File "/home/jovyan/people/Murtazin/mmdeploy/mmdeploy/apis/onnx/export.py", line 131, in export
    torch.onnx.export(
  File "/home/user/conda/envs/mmaction2dev/lib/python3.9/site-packages/torch/onnx/__init__.py", line 316, in export
    return utils.export(model, args, f, export_params, verbose, training,
  File "/home/user/conda/envs/mmaction2dev/lib/python3.9/site-packages/torch/onnx/utils.py", line 107, in export
    _export(model, args, f, export_params, verbose, training, input_names, output_names,
  File "/home/user/conda/envs/mmaction2dev/lib/python3.9/site-packages/torch/onnx/utils.py", line 724, in _export
    _model_to_graph(model, args, verbose, input_names,
  File "/home/jovyan/people/Murtazin/mmdeploy/mmdeploy/apis/onnx/optimizer.py", line 11, in model_to_graph__custom_optimizer
    graph, params_dict, torch_out = ctx.origin_func(*args, **kwargs)
  File "/home/user/conda/envs/mmaction2dev/lib/python3.9/site-packages/torch/onnx/utils.py", line 544, in _model_to_graph
    params_dict = torch._C._jit_pass_onnx_constant_fold(graph, params_dict,
IndexError: index_select(): Index is supposed to be a vecto
soltkreig commented 1 year ago

UPD2: I succesfully converted SWIN transformer but cannot still convert MViT.

RunningLeon commented 1 year ago

@soltkreig Hi, thanks for your feedback.

  1. Sadly, we did not support mobileViT series of networks by now as per the document.
  2. As for torch.einsum operator, we strongly suggest you replace it with normal operators in your pytorch codes. We have met same problem and used mmdeploy's rewriting feature to implicitly ovewrite it. You can refer to the example in mmseg: https://github.com/open-mmlab/mmdeploy/blob/92efd9cb7b768924bc3868e7fff81eb332e75f08/mmdeploy/codebase/mmseg/models/decode_heads/ema_head.py#L30-L45
soltkreig commented 1 year ago

@RunningLeon Hi! Thanks for your answer. 1) I'd like to notice that I use MViT from 1.x branch of MMAction which is Multiscale ViT, not mobileViT 2) I found the solution by simly adding do_constant_folding=False at torch.onnx.export at mmdeploy/apis/onnx/export.py, It works for video_recognition_static.py:


        torch.onnx.export(
            patched_model,
            args,
            output_path,
            export_params=True,
            input_names=input_names,
            output_names=output_names,
            opset_version=opset_version,
            dynamic_axes=dynamic_axes,
            keep_initializers_as_inputs=keep_initializers_as_inputs,
            do_constant_folding=False,
            verbose=verbose)

        if input_metas is not None:
            patched_model.forward = model_forward
RunningLeon commented 1 year ago

@soltkreig Hi, sorry for the misunderstanding. constant folding is done inside pytorch. There might be some bugs on certain versions of pytorch.

soltkreig commented 1 year ago

@RunningLeon Thank you, I guess you may close this issue.

RunningLeon commented 1 year ago

@RunningLeon Thank you, I guess you may close this issue.

@soltkreig Hi, good to know it. If possible, could give our project a star. That means a lot to the maintainers. Thanks in advance.