litehrnet pytorch2onnx fails

PeiqiWang commented 3 years ago

Describe the bug

I downloaded the checkpoint from mmpose modelzoo. I used the pytorch2onnx.py and failed.

Reproduction
Use command:

python tools/deployment/pytorch2onnx.py configs/body/2d_kpt_sview_rgb_img/topdown_heatmap/coco/litehrnet_30_coco_256x192.py checkpoints/litehrnet30_coco_256x192-4176555b_20210626.pth --verify --shape 1 3 256 192

Environment

python 3.7
torch 1.9.0
torchvision 0.10.0
mmpose 0.16.0
mmcv-full 1.3.9

Error traceback

If applicable, paste the error traceback here.

Traceback (most recent call last):
  File "../tools/deployment/pytorch2onnx.py", line 158, in <module>
    verify=args.verify)
  File "../tools/deployment/pytorch2onnx.py", line 74, in pytorch2onnx
    opset_version=opset_version)
  File "/mnt/hpc/wangpeiqi/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/onnx/__init__.py", line 280, in export
    custom_opsets, enable_onnx_checker, use_external_data_format)
  File "/mnt/hpc/wangpeiqi/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/onnx/utils.py", line 94, in export
    use_external_data_format=use_external_data_format)
  File "/mnt/hpc/wangpeiqi/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/onnx/utils.py", line 695, in _export
    dynamic_axes=dynamic_axes)
  File "/mnt/hpc/wangpeiqi/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/onnx/utils.py", line 467, in _model_to_graph
    module=module)
  File "/mnt/hpc/wangpeiqi/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/onnx/utils.py", line 200, in _optimize_graph
    graph = torch._C._jit_pass_onnx(graph, operator_export_type)
  File "/mnt/hpc/wangpeiqi/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/onnx/__init__.py", line 313, in _run_symbolic_function
    return utils._run_symbolic_function(*args, **kwargs)
  File "/mnt/hpc/wangpeiqi/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/onnx/utils.py", line 994, in _run_symbolic_function
    return symbolic_fn(g, *inputs, **attrs)
  File "/mnt/hpc/wangpeiqi/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/onnx/symbolic_helper.py", line 167, in wrapper
    for arg, arg_desc, arg_name in zip(args, arg_descriptors, arg_names)]
  File "/mnt/hpc/wangpeiqi/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/onnx/symbolic_helper.py", line 167, in <listcomp>
    for arg, arg_desc, arg_name in zip(args, arg_descriptors, arg_names)]
  File "/mnt/hpc/wangpeiqi/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/onnx/symbolic_helper.py", line 84, in _parse_arg
    "', since it's not constant, please try to make "
RuntimeError: Failed to export an ONNX attribute 'onnx::Gather', since it's not constant, please try to make things (e.g., kernel size) static if possible

jin-s13 commented 3 years ago

Thanks for reporting this issue. We will check this problem.

jin-s13 commented 3 years ago

Please check https://github.com/pytorch/pytorch/issues/34780. Maybe it is helpful.

PeiqiWang commented 3 years ago

Please check pytorch/pytorch#34780. Maybe it is helpful.

Thanks! I followed this answer and find the same implementation in https://github.com/open-mmlab/mmpose/blob/77d78f055f116d660e26aa82892b7aa408909f05/mmpose/models/backbones/litehrnet.py#L118

I changed it to the int constant and lead to another error. The function F.adaptive_avg_pool2d leaded to an NotImplementedError which reported by mmcv as follows:

Traceback (most recent call last):
  File "/mnt/hpc/wangpeiqi/anaconda3/envs/open-mmlab/lib/python3.7/pdb.py", line 1699, in main
    pdb._runscript(mainpyfile)
  File "/mnt/hpc/wangpeiqi/anaconda3/envs/open-mmlab/lib/python3.7/pdb.py", line 1568, in _runscript
    self.run(statement)
  File "/mnt/hpc/wangpeiqi/anaconda3/envs/open-mmlab/lib/python3.7/bdb.py", line 578, in run
    exec(cmd, globals, locals)
  File "<string>", line 1, in <module>
  File "/mnt/hpc/wangpeiqi/openmmlab/mmpose/tools/deployment/pytorch2onnx.py", line 1, in <module>
    import argparse
  File "/mnt/hpc/wangpeiqi/openmmlab/mmpose/tools/deployment/pytorch2onnx.py", line 74, in pytorch2onnx
    opset_version=opset_version)
  File "/mnt/hpc/wangpeiqi/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/onnx/__init__.py", line 280, in export
    custom_opsets, enable_onnx_checker, use_external_data_format)
  File "/mnt/hpc/wangpeiqi/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/onnx/utils.py", line 94, in export
    use_external_data_format=use_external_data_format)
  File "/mnt/hpc/wangpeiqi/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/onnx/utils.py", line 695, in _export
    dynamic_axes=dynamic_axes)
  File "/mnt/hpc/wangpeiqi/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/onnx/utils.py", line 467, in _model_to_graph
    module=module)
  File "/mnt/hpc/wangpeiqi/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/onnx/utils.py", line 200, in _optimize_graph
    graph = torch._C._jit_pass_onnx(graph, operator_export_type)
  File "/mnt/hpc/wangpeiqi/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/onnx/__init__.py", line 313, in _run_symbolic_function
    return utils._run_symbolic_function(*args, **kwargs)
  File "/mnt/hpc/wangpeiqi/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/onnx/utils.py", line 994, in _run_symbolic_function
    return symbolic_fn(g, *inputs, **attrs)
  File "/mnt/hpc/wangpeiqi/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/onnx/symbolic_helper.py", line 172, in wrapper
    return fn(g, *args, **kwargs)
  File "/mnt/hpc/wangpeiqi/openmmlab/mmcv/mmcv/onnx/symbolic.py", line 333, in symbolic_fn
    '[Adaptive pool]:input size not accessible')
NotImplementedError: [Adaptive pool]:input size not accessible
Uncaught exception. Entering post mortem debugging

How can I fix this problem? Thanks!

RunningLeon commented 3 years ago

Please check pytorch/pytorch#34780. Maybe it is helpful.

Thanks! I followed this answer and find the same implementation in

https://github.com/open-mmlab/mmpose/blob/77d78f055f116d660e26aa82892b7aa408909f05/mmpose/models/backbones/litehrnet.py#L118

I changed it to the int constant and lead to another error. The function F.adaptive_avg_pool2d leaded to an NotImplementedError which reported by mmcv as follows:

Traceback (most recent call last):
  File "/mnt/hpc/wangpeiqi/anaconda3/envs/open-mmlab/lib/python3.7/pdb.py", line 1699, in main
    pdb._runscript(mainpyfile)
  File "/mnt/hpc/wangpeiqi/anaconda3/envs/open-mmlab/lib/python3.7/pdb.py", line 1568, in _runscript
    self.run(statement)
  File "/mnt/hpc/wangpeiqi/anaconda3/envs/open-mmlab/lib/python3.7/bdb.py", line 578, in run
    exec(cmd, globals, locals)
  File "<string>", line 1, in <module>
  File "/mnt/hpc/wangpeiqi/openmmlab/mmpose/tools/deployment/pytorch2onnx.py", line 1, in <module>
    import argparse
  File "/mnt/hpc/wangpeiqi/openmmlab/mmpose/tools/deployment/pytorch2onnx.py", line 74, in pytorch2onnx
    opset_version=opset_version)
  File "/mnt/hpc/wangpeiqi/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/onnx/__init__.py", line 280, in export
    custom_opsets, enable_onnx_checker, use_external_data_format)
  File "/mnt/hpc/wangpeiqi/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/onnx/utils.py", line 94, in export
    use_external_data_format=use_external_data_format)
  File "/mnt/hpc/wangpeiqi/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/onnx/utils.py", line 695, in _export
    dynamic_axes=dynamic_axes)
  File "/mnt/hpc/wangpeiqi/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/onnx/utils.py", line 467, in _model_to_graph
    module=module)
  File "/mnt/hpc/wangpeiqi/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/onnx/utils.py", line 200, in _optimize_graph
    graph = torch._C._jit_pass_onnx(graph, operator_export_type)
  File "/mnt/hpc/wangpeiqi/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/onnx/__init__.py", line 313, in _run_symbolic_function
    return utils._run_symbolic_function(*args, **kwargs)
  File "/mnt/hpc/wangpeiqi/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/onnx/utils.py", line 994, in _run_symbolic_function
    return symbolic_fn(g, *inputs, **attrs)
  File "/mnt/hpc/wangpeiqi/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/onnx/symbolic_helper.py", line 172, in wrapper
    return fn(g, *args, **kwargs)
  File "/mnt/hpc/wangpeiqi/openmmlab/mmcv/mmcv/onnx/symbolic.py", line 333, in symbolic_fn
    '[Adaptive pool]:input size not accessible')
NotImplementedError: [Adaptive pool]:input size not accessible
Uncaught exception. Entering post mortem debugging

How can I fix this problem? Thanks!

@PeiqiWang Hi, add min_size = [int(_) for _ in min_size] after that line and with PyTorch==1.8.0 onnx==1.8.0, the model is able to be exported to ONNX free of errors in my machine.

bobby20180331 commented 2 years ago

I dont' have a clear idea about your solution. I meet the same problem when use lite-hrnet-18.can you please give a more specific suggestion? thanks! litehrnet_18_pet32kp20k_256x192.py the pool2d configs in : mmpose/models/backbones/litehrnet.py as followed:

47 self.global_avgpool = nn.AdaptiveAvgPool2d(1)

117 def forward(self, x): 118 mini_size = x[-1].size()[-2:] 119 out = [F.adaptive_avg_pool2d(s, mini_size) for s in x[:-1]] + [x[-1]]

bobby20180331 commented 2 years ago

Please check pytorch/pytorch#34780. Maybe it is helpful.

Thanks! I followed this answer and find the same implementation in https://github.com/open-mmlab/mmpose/blob/77d78f055f116d660e26aa82892b7aa408909f05/mmpose/models/backbones/litehrnet.py#L118

I changed it to the int constant and lead to another error. The function F.adaptive_avg_pool2d leaded to an NotImplementedError which reported by mmcv as follows:

Traceback (most recent call last):
  File "/mnt/hpc/wangpeiqi/anaconda3/envs/open-mmlab/lib/python3.7/pdb.py", line 1699, in main
    pdb._runscript(mainpyfile)
  File "/mnt/hpc/wangpeiqi/anaconda3/envs/open-mmlab/lib/python3.7/pdb.py", line 1568, in _runscript
    self.run(statement)
  File "/mnt/hpc/wangpeiqi/anaconda3/envs/open-mmlab/lib/python3.7/bdb.py", line 578, in run
    exec(cmd, globals, locals)
  File "<string>", line 1, in <module>
  File "/mnt/hpc/wangpeiqi/openmmlab/mmpose/tools/deployment/pytorch2onnx.py", line 1, in <module>
    import argparse
  File "/mnt/hpc/wangpeiqi/openmmlab/mmpose/tools/deployment/pytorch2onnx.py", line 74, in pytorch2onnx
    opset_version=opset_version)
  File "/mnt/hpc/wangpeiqi/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/onnx/__init__.py", line 280, in export
    custom_opsets, enable_onnx_checker, use_external_data_format)
  File "/mnt/hpc/wangpeiqi/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/onnx/utils.py", line 94, in export
    use_external_data_format=use_external_data_format)
  File "/mnt/hpc/wangpeiqi/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/onnx/utils.py", line 695, in _export
    dynamic_axes=dynamic_axes)
  File "/mnt/hpc/wangpeiqi/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/onnx/utils.py", line 467, in _model_to_graph
    module=module)
  File "/mnt/hpc/wangpeiqi/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/onnx/utils.py", line 200, in _optimize_graph
    graph = torch._C._jit_pass_onnx(graph, operator_export_type)
  File "/mnt/hpc/wangpeiqi/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/onnx/__init__.py", line 313, in _run_symbolic_function
    return utils._run_symbolic_function(*args, **kwargs)
  File "/mnt/hpc/wangpeiqi/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/onnx/utils.py", line 994, in _run_symbolic_function
    return symbolic_fn(g, *inputs, **attrs)
  File "/mnt/hpc/wangpeiqi/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/onnx/symbolic_helper.py", line 172, in wrapper
    return fn(g, *args, **kwargs)
  File "/mnt/hpc/wangpeiqi/openmmlab/mmcv/mmcv/onnx/symbolic.py", line 333, in symbolic_fn
    '[Adaptive pool]:input size not accessible')
NotImplementedError: [Adaptive pool]:input size not accessible
Uncaught exception. Entering post mortem debugging

How can I fix this problem? Thanks!

@PeiqiWang Hi, add min_size = [int(_) for _ in min_size] after that line and with PyTorch==1.8.0 onnx==1.8.0, the model is able to be exported to ONNX free of errors in my machine.

I dont' have a clear idea about your solution. I meet the same problem when use lite-hrnet-18.can you please give a more specific suggestion? thanks! litehrnet_18_pet32kp20k_256x192.py the pool2d configs in : mmpose/models/backbones/litehrnet.py as followed:

47 self.global_avgpool = nn.AdaptiveAvgPool2d(1)

117 def forward(self, x): 118 mini_size = x[-1].size()[-2:] 119 out = [F.adaptive_avg_pool2d(s, mini_size) for s in x[:-1]] + [x[-1]]

RunningLeon commented 2 years ago

@bobby20180331 Hi, LiteHRNet exporting to ONNX is supported in mmdeploy. You could have a try.

bobby20180331 commented 2 years ago

@RunningLeon thanks. I have try the mmdeploy and get an error as followed.

File "/data/****/mmdeploy/mmdeploy/pytorch/ops/adaptive_avg_pool.py", line 21, in symbolic_fn raise NotImplementedError( NotImplementedError: [Adaptive pool]:input size not accessible 2022-03-02 06:06:39,760 - mmdeploy - ERROR - torch2onnx failed.

Seems the main problem for lite-hrnet convert is caused by "47 self.global_avgpool = nn.AdaptiveAvgPool2d(1)"

RunningLeon commented 2 years ago

@bobby20180331 Hi, thanks for the feedback. Actually, we hope you could make the issue on mmdeploy repo. Please post the script you run in the issue. and which file is it from self.global_avgpool = nn.AdaptiveAvgPool2d(1)?

open-mmlab / mmpose

litehrnet pytorch2onnx fails #820