open-mmlab / mmdeploy

OpenMMLab Model Deployment Framework
https://mmdeploy.readthedocs.io/en/latest/
Apache License 2.0
2.75k stars 631 forks source link

Segformer export to ONNX doesn't work - aten::unflatten is not supported #1970

Closed fschvart closed 1 year ago

fschvart commented 1 year ago

Checklist

Describe the bug

I'm using mmdeploy 1.0.0, on a segformer model that was trained using mmsegmentation 1.0.0, and it won't let me convert. It seems like an operation was used that isn't supported yet by ONNX (aten::unflatten)

Segformer does appear in the list of supported models for mmsegmentation, and if I recall correctly, I was able to convert it to ONNX in the past.

I trained the model on the latest Nvidia PyTorch docker (CUDA 12.0, PyTorch 2.0.0)

I'll really appreciate your help!

Reproduction

python deploy.py ../configs/mmseg/segmentation_oonxruntime_dynamic.py .....

Environment

Nvidia latest docker (23.03), CUDA 12.0, PyTorch 2.0.0 on WSL

Error traceback

root@9e43eb3286a2:/workspace/mmdeploy# python tools/deploy.py configs/mmseg/segmentation_onnx_fhd.py /data/vids/segformer_mit-b5_8xb1-160k_cityscapes-1024x1024.py /data/vids/iter_90000.pth /data/led/imgs/train
/2.png --work-dir ./output/ --device cuda --dump-info
04/07 22:52:01 - mmengine - WARNING - Failed to search registry with scope "mmseg" in the "Codebases" registry tree. As a workaround, the current "Codebases" registry in "mmdeploy" is used to build instance. This may cause unexpected failure when running the built modules. Please check whether "mmseg" is a correct scope, or whether the registry is initialized.
04/07 22:52:01 - mmengine - WARNING - Failed to search registry with scope "mmseg" in the "mmseg_tasks" registry tree. As a workaround, the current "mmseg_tasks" registry in "mmdeploy" is used to build instance. This may cause unexpected failure when running the built modules. Please check whether "mmseg" is a correct scope, or whether the registry is initialized.
04/07 22:52:03 - mmengine - INFO - Start pipeline mmdeploy.apis.pytorch2onnx.torch2onnx in subprocess
04/07 22:52:04 - mmengine - WARNING - Failed to search registry with scope "mmseg" in the "Codebases" registry tree. As a workaround, the current "Codebases" registry in "mmdeploy" is used to build instance. This may cause unexpected failure when running the built modules. Please check whether "mmseg" is a correct scope, or whether the registry is initialized.
04/07 22:52:04 - mmengine - WARNING - Failed to search registry with scope "mmseg" in the "mmseg_tasks" registry tree. As a workaround, the current "mmseg_tasks" registry in "mmdeploy" is used to build instance. This may cause unexpected failure when running the built modules. Please check whether "mmseg" is a correct scope, or whether the registry is initialized.
/usr/local/lib/python3.8/dist-packages/mmseg/models/decode_heads/decode_head.py:120: UserWarning: For binary segmentation, we suggest using`out_channels = 1` to define the outputchannels of segmentor, and use `threshold`to convert `seg_logits` into a predictionapplying a threshold
  warnings.warn('For binary segmentation, we suggest using'
/usr/local/lib/python3.8/dist-packages/mmseg/models/builder.py:36: UserWarning: ``build_loss`` would be deprecated soon, please use ``mmseg.registry.MODELS.build()``
  warnings.warn('``build_loss`` would be deprecated soon, please use '
/usr/local/lib/python3.8/dist-packages/mmseg/models/losses/cross_entropy_loss.py:235: UserWarning: Default ``avg_non_ignore`` is False, if you would like to ignore the certain label and average loss over non-ignore labels, which is the same with PyTorch official cross_entropy, set ``avg_non_ignore=True``.
  warnings.warn(
Loads checkpoint by local backend from path: /data/vids/iter_90000.pth
04/07 22:52:10 - mmengine - WARNING - DeprecationWarning: get_onnx_config will be deprecated in the future.
04/07 22:52:10 - mmengine - INFO - Export PyTorch model to ONNX: ./output/end2end.onnx.
/usr/local/lib/python3.8/dist-packages/mmdeploy/core/optimizers/function_marker.py:160: TracerWarning: Converting a tensor to a Python integer might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  ys_shape = tuple(int(s) for s in ys.shape)
/usr/local/lib/python3.8/dist-packages/mmdeploy/codebase/mmseg/models/segmentors/base.py:46: TracerWarning: Converting a tensor to a Python integer might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  img_shape = [int(val) for val in img_shape]
/usr/local/lib/python3.8/dist-packages/mmseg/models/utils/shape_convert.py:15: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  assert L == H * W, 'The seq_len doesn\'t match H, W'
/usr/local/lib/python3.8/dist-packages/mmcv/cnn/bricks/wrappers.py:44: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if x.numel() == 0 and obsolete_torch_version(TORCH_VERSION, (1, 4)):
=========== Diagnostic Run torch.onnx.export version 2.0.0a0+1767026 ===========
verbose: False, log level: Level.ERROR
======================= 0 NONE 0 NOTE 0 WARNING 1 ERROR ========================
ERROR: missing-standard-symbolic-function
=========================================
Exporting the operator 'aten::unflatten' to ONNX opset version 11 is not supported. Please feel free to request support or submit a pull request on PyTorch GitHub: https://github.com/pytorch/pytorch/issues.
<Set verbose=True to see more details>

Process Process-2:
Traceback (most recent call last):
  File "/usr/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/usr/lib/python3.8/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.8/dist-packages/mmdeploy/apis/core/pipeline_manager.py", line 107, in __call__
    ret = func(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/mmdeploy/apis/pytorch2onnx.py", line 98, in torch2onnx
    export(
  File "/usr/local/lib/python3.8/dist-packages/mmdeploy/apis/core/pipeline_manager.py", line 356, in _wrap
    return self.call_function(func_name_, *args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/mmdeploy/apis/core/pipeline_manager.py", line 326, in call_function
    return self.call_function_local(func_name, *args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/mmdeploy/apis/core/pipeline_manager.py", line 275, in call_function_local
    return pipe_caller(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/mmdeploy/apis/core/pipeline_manager.py", line 107, in __call__
    ret = func(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/mmdeploy/apis/onnx/export.py", line 131, in export
    torch.onnx.export(
  File "/usr/local/lib/python3.8/dist-packages/torch/onnx/utils.py", line 506, in export
    _export(
  File "/usr/local/lib/python3.8/dist-packages/torch/onnx/utils.py", line 1533, in _export
    graph, params_dict, torch_out = _model_to_graph(
  File "/usr/local/lib/python3.8/dist-packages/mmdeploy/apis/onnx/optimizer.py", line 11, in model_to_graph__custom_optimizer
    graph, params_dict, torch_out = ctx.origin_func(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/torch/onnx/utils.py", line 1117, in _model_to_graph
    graph = _optimize_graph(
  File "/usr/local/lib/python3.8/dist-packages/torch/onnx/utils.py", line 665, in _optimize_graph
    graph = _C._jit_pass_onnx(graph, operator_export_type)
  File "/usr/local/lib/python3.8/dist-packages/torch/onnx/utils.py", line 1878, in _run_symbolic_function
    raise errors.UnsupportedOperatorError(
torch.onnx.errors.UnsupportedOperatorError: Exporting the operator 'aten::unflatten' to ONNX opset version 11 is not supported. Please feel free to request support or submit a pull request on PyTorch GitHub: https://github.com/pytorch/pytorch/issues.
04/07 22:52:15 - mmengine - ERROR - /usr/local/lib/python3.8/dist-packages/mmdeploy/apis/core/pipeline_manager.py - pop_mp_output - 80 - `mmdeploy.apis.pytorch2onnx.torch2onnx` with Call id: 0 failed. exit.
irexyc commented 1 year ago

We haven't support pytorch2.0 yet, please use lower version of pytorch.

gnscc commented 3 months ago

We haven't support pytorch2.0 yet, please use lower version of pytorch.

Why is it in the mmdeploy 1.3 docker image though? I can't deploy segformer model because of this.

kelvinwang139 commented 3 months ago

We haven't support pytorch2.0 yet, please use lower version of pytorch.

Why is it in the mmdeploy 1.3 docker image though? I can't deploy segformer model because of this.

Hi which docker file is used in your case? is prebuild docker?

gnscc commented 3 months ago

is prebuild docker?

Yes, it is the docker image in docker hub.

To reproduce

Run a container with the image: docker run -it openmmlab/mmdeploy:ubuntu20.04-cuda11.8-mmdeploy1.3.0 /bin/bash

Show torch info: pip3 show torch

Name: torch
Version: 2.0.0+cu118
Summary: Tensors and Dynamic neural networks in Python with strong GPU acceleration
Home-page: https://pytorch.org/
Author: PyTorch Team
Author-email: packages@pytorch.org
License: BSD-3
Location: /usr/local/lib/python3.8/dist-packages
Requires: filelock, jinja2, networkx, sympy, triton, typing-extensions
Required-by: torchvision, triton