xiuqhou / Relation-DETR

[ECCV2024 Oral] Official implementation of the paper "Relation DETR: Exploring Explicit Position Relation Prior for Object Detection"
Apache License 2.0
97 stars 6 forks source link

[Bug]: 推理和转换ONNX时报错 #24

Open m00nLi opened 3 days ago

m00nLi commented 3 days ago

Bug

训练正常,但是使用inference.py推理或者转换ONNX时报错:

Traceback (most recent call last):
  File "/home/code/Relation-DETR/inference.py", line 165, in <module>
    inference()
  File "/home/user/code/Relation-DETR/inference.py", line 99, in inference
    model = Config(args.model_config).model.eval()
  File "/home/code/Relation-DETR/util/lazy_load.py", line 35, in __init__
    mod = importlib.import_module(module_name)
  File "/home/user/miniconda3/envs/detr/lib/python3.10/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 883, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "/home/code/Relation-DETR/configs/train_config.py", line 45, in <module>
    optimizer = optim.AdamW(lr=learning_rate, weight_decay=1e-4, betas=(0.9, 0.999))
TypeError: AdamW.__init__() missing 1 required positional argument: 'params'

环境信息

-------------------------------  ---------------------------------------------------------------------------------------
sys.platform                     linux
Python                           3.10.14 | packaged by conda-forge | (main, Mar 20 2024, 12:45:18) [GCC 12.3.0]
numpy                            1.24.4
PyTorch                          2.4.0+cu121 @/home/user/miniconda3/envs/detr/lib/python3.10/site-packages/torch
PyTorch debug build              False
torch._C._GLIBCXX_USE_CXX11_ABI  False
GPU available                    Yes
GPU 0,1,2,3,4,5,6,7              NVIDIA H800 (arch=9.0)
Driver version                   560.35.03
CUDA_HOME                        /usr/local/cuda-12.4
Pillow                           10.4.0
torchvision                      0.19.0+cu121 @/home/user/miniconda3/envs/detr/lib/python3.10/site-packages/torchvision
torchvision arch flags           5.0, 6.0, 7.0, 7.5, 8.0, 8.6, 9.0
fvcore                           0.1.5.post20221221
iopath                           0.1.10
cv2                              4.10.0
-------------------------------  ---------------------------------------------------------------------------------------

补充信息

No response

xiuqhou commented 3 days ago

model_config应该指定的是模型配置而不是训练配置,例如: configs/relation_detr/relation_detr_resnet50_800_1333.py

m00nLi commented 3 days ago

model_config应该指定的是模型配置而不是训练配置,例如: configs/relation_detr/relation_detr_resnet50_800_1333.py

好的,换成模型配置没问题了

m00nLi commented 3 days ago

导出ONNX时报错:

torch.onnx.errors.UnsupportedOperatorError: Exporting the operator 'aten::_upsample_bilinear2d_aa' to ONNX opset version 17 is not supported. Please feel free to request support or submit a pull request on PyTorch GitHub: https://github.com/pytorch/pytorch/issues.

ONNX版本

onnx                              1.16.2
onnxruntime                       1.16.0
onnxsim                           0.4.36
rapidocr-onnxruntime              1.3.24
xiuqhou commented 3 days ago

Hi @m00nLi 这是因为pytorch不支持抗锯齿resize导出ONNX, 请把models/detectors/base_detector.py第75行的antialias设置为False再导出

SEU-ZWW commented 3 days ago

Hi @m00nLi 这是因为pytorch不支持抗锯齿resize导出ONNX, 请把models/detectors/base_detector.py第75行的antialias设置为False再导出

设置为False后还是会报同样的错,opset 11也不行

xiuqhou commented 1 day ago

请看一下报的错误是aten::_upsample_bilinear2d_aa还是aten::_upsample_bilinear2d。带_aa版本的算子目前pytorch都不支持,而设置为False使用的是不带_aa版本的ONNX算子,这个算子可能在opset 11不支持,但在opset 17是支持的,请试试opset 17可以吗?

SEU-ZWW commented 1 day ago

请看一下报的错误是aten::_upsample_bilinear2d_aa还是aten::_upsample_bilinear2d。带_aa版本的算子目前pytorch都不支持,而设置为False使用的是不带_aa版本的ONNX算子,这个算子可能在opset 11不支持,但在opset 17是支持的,请试试opset 17可以吗? 报错的是带_aa的算子,那请问作者目前有什么办法可以把pt转成onnx么,麻烦给个解决方案吧,谢谢

xiuqhou commented 4 hours ago

Hi @SEU-ZWW 如果报错的是带_aa的算子,说明antialias还是True。麻烦检查一下参数设置,可以调试看看为什么antialias设置为False没有生效。