open-mmlab / mmsegmentation

OpenMMLab Semantic Segmentation Toolbox and Benchmark.
https://mmsegmentation.readthedocs.io/en/main/
Apache License 2.0
8.17k stars 2.6k forks source link

pytorch2onnx script failed #1962

Open jyang68sh opened 2 years ago

jyang68sh commented 2 years ago

Describe the bug

Running pytorch2onnx on standard stdc2 512X1024 config failed. 

Error code: TypeError: forward() got multiple values for argument 'img_metas'

Reproduction

  1. What command or script did you run?

python tools/pytorch2onnx.py /home/unj1szh/xspace/work_dirs/stdc2_4x4_pretrain_80k/stdc2_in1k-pre_512x1024_80k_cityscapes.py --shape 600 600

  1. Did you make any modifications on the code or config? Did you understand what you have modified?

No

  1. What dataset did you use?

Cityscapes

  1. Please run python mmseg/utils/collect_env.py to collect necessary environment information and paste it here.
sys.platform: linux
Python: 3.8.12 (default, Oct 12 2021, 13:49:34) [GCC 7.5.0]
CUDA available: True
GPU 0,1,2,3: NVIDIA GeForce RTX 3080
CUDA_HOME: /usr/local/cuda-11.1
NVCC: Cuda compilation tools, release 11.1, V11.1.105
GCC: gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
PyTorch: 1.12.1
PyTorch compiling details: PyTorch built with:
  - GCC 9.3
  - C++ Version: 201402
  - Intel(R) oneAPI Math Kernel Library Version 2021.4-Product Build 20210904 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v2.6.0 (Git Hash 52b5f107dd9cf10910aaa19cb47f3abf9b349815)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - LAPACK is enabled (usually provided by MKL)
  - NNPACK is enabled
  - CPU capability usage: AVX2
  - CUDA Runtime 11.3
  - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_37,code=compute_37
  - CuDNN 8.3.2  (built against CUDA 11.5)
  - Magma 2.5.2
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.3, CUDNN_VERSION=8.3.2, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -fabi-version=11 -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.12.1, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=OFF, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, 

TorchVision: 0.13.1
OpenCV: 4.5.5
MMCV: 1.6.1
MMCV Compiler: GCC 9.4
MMCV CUDA Compiler: 11.1
MMSegmentation: 0.27.0+dd42fa8
  1. You may add addition that may be helpful for locating the problem, such as

conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch

Error traceback

If applicable, paste the error trackback here.

Traceback (most recent call last):
  File "tools/pytorch2onnx.py", line 386, in <module>
    pytorch2onnx(
  File "tools/pytorch2onnx.py", line 195, in pytorch2onnx
    torch.onnx.export(
  File "/home/unj1szh/.conda/envs/xcbase/lib/python3.8/site-packages/torch/onnx/__init__.py", line 350, in export
    return utils.export(
  File "/home/unj1szh/.conda/envs/xcbase/lib/python3.8/site-packages/torch/onnx/utils.py", line 163, in export
    _export(
  File "/home/unj1szh/.conda/envs/xcbase/lib/python3.8/site-packages/torch/onnx/utils.py", line 1074, in _export
    graph, params_dict, torch_out = _model_to_graph(
  File "/home/unj1szh/.conda/envs/xcbase/lib/python3.8/site-packages/torch/onnx/utils.py", line 727, in _model_to_graph
    graph, params, torch_out, module = _create_jit_graph(model, args)
  File "/home/unj1szh/.conda/envs/xcbase/lib/python3.8/site-packages/torch/onnx/utils.py", line 602, in _create_jit_graph
    graph, torch_out = _trace_and_get_graph_from_model(model, args)
  File "/home/unj1szh/.conda/envs/xcbase/lib/python3.8/site-packages/torch/onnx/utils.py", line 517, in _trace_and_get_graph_from_model
    trace_graph, torch_out, inputs_states = torch.jit._get_trace_graph(
  File "/home/unj1szh/.conda/envs/xcbase/lib/python3.8/site-packages/torch/jit/_trace.py", line 1175, in _get_trace_graph
    outs = ONNXTracedModule(f, strict, _force_outplace, return_inputs, _return_inputs_states)(*args, **kwargs)
  File "/home/unj1szh/.conda/envs/xcbase/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/unj1szh/.conda/envs/xcbase/lib/python3.8/site-packages/torch/jit/_trace.py", line 127, in forward
    graph, out = torch._C._create_graph_by_tracing(
  File "/home/unj1szh/.conda/envs/xcbase/lib/python3.8/site-packages/torch/jit/_trace.py", line 118, in wrapper
    outs.append(self.inner(*trace_inputs))
  File "/home/unj1szh/.conda/envs/xcbase/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/unj1szh/.conda/envs/xcbase/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1118, in _slow_forward
    result = self.forward(*input, **kwargs)
  File "/home/unj1szh/xspace/4_labs/mmcv/mmcv/runner/fp16_utils.py", line 116, in new_func
    return old_func(*args, **kwargs)
TypeError: forward() got multiple values for argument 'img_metas'

Bug fix

I have not identified the reason

jyang68sh commented 2 years ago

Hi @MeowZheng Please have a look. I tried to debug but the error was ambiguous to me.

Thanks!

MeowZheng commented 2 years ago

Based on the error log, there might be some bc-breaks caused by mmcv updates, and I just suggest opening an issue in mmcv.

jyang68sh commented 2 years ago

Based on the error log, there might be some bc-breaks caused by mmcv updates, and I just suggest opening an issue in mmcv.

@MeowZheng Hi I have opened the issue at mmcv https://github.com/open-mmlab/mmcv/issues/2226 If there is an answer, I will let you know.