zhiqwang / yolort

yolort is a runtime stack for yolov5 on specialized accelerators such as tensorrt, libtorch, onnxruntime, tvm and ncnn.
https://zhiqwang.com/yolort
GNU General Public License v3.0
708 stars 153 forks source link

Can not export to ONNX model. AttributeError: 'NoneType' object has no attribute 'shape' #485

Open willjoy opened 1 year ago

willjoy commented 1 year ago

🐛 Describe the bug

I followed the NoteBook to export a pre-trained YOLOv5 model to ONNX model. It's successful all the way to inferencing in yolov5-rt. I tested on a custom image, it gives correct boxes, labels, scores.

However, when I try to export to ONNX model with NMS, it failed. export_onnx(model=model, onnx_path=onnx_path, opset_version=opset_version)

When I run the above code, it gives error. It's much appreciated if anyone can help.

`---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
~\AppData\Local\Temp\ipykernel_5588\1593340319.py in <module>
----> 1 export_onnx(model=model, onnx_path=onnx_path, opset_version=opset_version)

~\Anaconda3\lib\site-packages\yolort\runtime\ort_helper.py in export_onnx(onnx_path, checkpoint_path, model, size, size_divisible, score_thresh, nms_thresh, version, skip_preprocess, opset_version, batch_size, vanilla, simplify)
     84         )
     85 
---> 86     onnx_builder.to_onnx(onnx_path, simplify)
     87 
     88 

~\Anaconda3\lib\site-packages\torch\autograd\grad_mode.py in decorate_context(*args, **kwargs)
     25         def decorate_context(*args, **kwargs):
     26             with self.clone():
---> 27                 return func(*args, **kwargs)
     28         return cast(F, decorate_context)
     29 

~\Anaconda3\lib\site-packages\yolort\runtime\ort_helper.py in to_onnx(self, onnx_path, simplify, **kwargs)
    232         """
    233         with BytesIO() as f:
--> 234             torch.onnx.export(
    235                 self.model,
    236                 self.input_sample,

~\Anaconda3\lib\site-packages\torch\onnx\utils.py in export(model, args, f, export_params, verbose, training, input_names, output_names, operator_export_type, opset_version, do_constant_folding, dynamic_axes, keep_initializers_as_inputs, custom_opsets, export_modules_as_functions)
    502     """
    503 
--> 504     _export(
    505         model,
    506         args,

~\Anaconda3\lib\site-packages\torch\onnx\utils.py in _export(model, args, f, export_params, verbose, training, input_names, output_names, operator_export_type, export_type, opset_version, do_constant_folding, dynamic_axes, keep_initializers_as_inputs, fixed_batch_size, custom_opsets, add_node_names, onnx_shape_inference, export_modules_as_functions)
   1527             _validate_dynamic_axes(dynamic_axes, model, input_names, output_names)
   1528 
-> 1529             graph, params_dict, torch_out = _model_to_graph(
   1530                 model,
   1531                 args,

~\Anaconda3\lib\site-packages\torch\onnx\utils.py in _model_to_graph(model, args, verbose, input_names, output_names, operator_export_type, do_constant_folding, _disable_torch_constant_prop, fixed_batch_size, training, dynamic_axes)
   1109 
   1110     model = _pre_trace_quant_model(model, args)
-> 1111     graph, params, torch_out, module = _create_jit_graph(model, args)
   1112     params_dict = _get_named_param_dict(graph, params)
   1113 

~\Anaconda3\lib\site-packages\torch\onnx\utils.py in _create_jit_graph(model, args)
    985         return graph, params, torch_out, None
    986 
--> 987     graph, torch_out = _trace_and_get_graph_from_model(model, args)
    988     _C._jit_pass_onnx_lint(graph)
    989     state_dict = torch.jit._unique_state_dict(model)

~\Anaconda3\lib\site-packages\torch\onnx\utils.py in _trace_and_get_graph_from_model(model, args)
    889     prev_autocast_cache_enabled = torch.is_autocast_cache_enabled()
    890     torch.set_autocast_cache_enabled(False)
--> 891     trace_graph, torch_out, inputs_states = torch.jit._get_trace_graph(
    892         model,
    893         args,

~\Anaconda3\lib\site-packages\torch\jit\_trace.py in _get_trace_graph(f, args, kwargs, strict, _force_outplace, return_inputs, _return_inputs_states)
   1182     if not isinstance(args, tuple):
   1183         args = (args,)
-> 1184     outs = ONNXTracedModule(f, strict, _force_outplace, return_inputs, _return_inputs_states)(*args, **kwargs)
   1185     return outs

~\Anaconda3\lib\site-packages\torch\nn\modules\module.py in _call_impl(self, *input, **kwargs)
   1192         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1193                 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1194             return forward_call(*input, **kwargs)
   1195         # Do not call functions when jit is used
   1196         full_backward_hooks, non_full_backward_hooks = [], []

~\Anaconda3\lib\site-packages\torch\jit\_trace.py in forward(self, *args)
    125                 return tuple(out_vars)
    126 
--> 127         graph, out = torch._C._create_graph_by_tracing(
    128             wrapper,
    129             in_vars + module_state,

~\Anaconda3\lib\site-packages\torch\jit\_trace.py in wrapper(*args)
    116             if self._return_inputs_states:
    117                 inputs_states.append(_unflatten(in_args, in_desc))
--> 118             outs.append(self.inner(*trace_inputs))
    119             if self._return_inputs_states:
    120                 inputs_states[0] = (inputs_states[0], trace_inputs)

~\Anaconda3\lib\site-packages\torch\nn\modules\module.py in _call_impl(self, *input, **kwargs)
   1192         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1193                 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1194             return forward_call(*input, **kwargs)
   1195         # Do not call functions when jit is used
   1196         full_backward_hooks, non_full_backward_hooks = [], []

~\Anaconda3\lib\site-packages\torch\nn\modules\module.py in _slow_forward(self, *input, **kwargs)
   1180                 recording_scopes = False
   1181         try:
-> 1182             result = self.forward(*input, **kwargs)
   1183         finally:
   1184             if recording_scopes:

~\Anaconda3\lib\site-packages\yolort\models\yolov5.py in forward(self, inputs, targets)
    144         if not self.training:
    145             for img in inputs:
--> 146                 val = img.shape[-2:]
    147                 assert len(val) == 2
    148                 original_image_sizes.append((val[0], val[1]))

AttributeError: 'NoneType' object has no attribute 'shape'`

Versions

Collecting environment information... PyTorch version: 1.13.1 Is debug build: False CUDA used to build PyTorch: 11.7 ROCM used to build PyTorch: N/A

OS: Microsoft Windows 10 Enterprise GCC version: Could not collect Clang version: Could not collect CMake version: Could not collect Libc version: N/A

Python version: 3.9.13 (main, Aug 25 2022, 23:51:50) [MSC v.1916 64 bit (AMD64)] (64-bit runtime) Python platform: Windows-10-10.0.19045-SP0 Is CUDA available: True CUDA runtime version: 11.7.99 CUDA_MODULE_LOADING set to: LAZY GPU models and configuration: GPU 0: NVIDIA T1200 Laptop GPU Nvidia driver version: 517.13 cuDNN version: Could not collect HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True

CPU: Architecture=9 CurrentClockSpeed=2496 DeviceID=CPU0 Family=198 L2CacheSize=10240 L2CacheSpeed= Manufacturer=GenuineIntel MaxClockSpeed=2496 Name=11th Gen Intel(R) Core(TM) i7-11850H @ 2.50GHz ProcessorType=3 Revision=

Versions of relevant libraries: [pip3] mypy-extensions==0.4.3 [pip3] numpy==1.24.2 [pip3] numpydoc==1.5.0 [pip3] torch==1.13.1 [pip3] torchaudio==0.13.1 [pip3] torchvision==0.14.1 [conda] blas 1.0 mkl [conda] mkl 2021.4.0 haa95532_640 [conda] mkl-service 2.4.0 py39h2bbff1b_0 [conda] mkl_fft 1.3.1 py39h277e83a_0 [conda] mkl_random 1.2.2 py39hf11a4ad_0 [conda] numpy 1.24.2 pypi_0 pypi [conda] numpydoc 1.5.0 py39haa95532_0 [conda] pytorch 1.13.1 py3.9_cuda11.7_cudnn8_0 pytorch [conda] pytorch-cuda 11.7 h67b0de4_1 pytorch [conda] pytorch-mutex 1.0 cuda pytorch [conda] torchaudio 0.13.1 pypi_0 pypi [conda] torchvision 0.14.1 pypi_0 pypi

Wikidepia commented 1 year ago

I got the exact same error, it will work with batch_size != 1

willjoy commented 1 year ago

I got the exact same error, it will work with batch_size != 1

You are right. I just tried with batch_size != 1, it worked. But still wonder if the bug can be fixed later on.

zhiqwang commented 1 year ago

Sorry @willjoy and @Wikidepia , I've missed this tickets, I can reproduce this problem locally. And seems that this is caused by the newer version of PyTorch (1.11-1.13), I only test this function at PyTorch 1.9-1.10, we need to fix this problem.

SharynHu commented 1 year ago

I reproduced this problem on my ubuntu platform with torch2.0cu117. Everything seems fine till torch.onnx.export runs the function _decide_input_format(model, args) internally. This function will change the passedargs parameter, appending a None to its end. I think it intends to match the input,target signature in the YOLO5:forward method. Since target is not provided explicitly, it adds an None to the input. However, when farwarding, that added None is not interpreted as the second parameter target but the second element of input, which produces this bug.

SangbumChoi commented 10 months ago

@zhiqwang @SharynHu is right. instead of using https://github.com/zhiqwang/yolort/blob/672ae82b08d31cfb107bfd29f75b4af7b47f5122/yolort/runtime/ort_helper.py#L236

try to use tuple with an empty target None

        with BytesIO() as f:
            torch.onnx.export(
                self.model,
                (self.input_sample, None),
                f,
                do_constant_folding=True,
                opset_version=self._opset_version,
                input_names=self.input_names,
                output_names=self.output_names,
                dynamic_axes=self.dynamic_axes,
                **kwargs,
            )
mr-mainak commented 9 months ago

However while transforming such output is coming:

/home/mainak/ms/python/onnx_env/lib/python3.8/site-packages/yolort/models/transform.py:282: TracerWarning: Iterating over a tensor might cause the trace to be incorrect. Passing a tensor of different shape won't change the number of iterations executed (and might lead to errors or silently give incorrect results).
  img_h, img_w = _get_shape_onnx(img)
/home/mainak/ms/python/onnx_env/lib/python3.8/site-packages/yolort/models/anchor_utils.py:46: TracerWarning: torch.as_tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
  anchors = torch.as_tensor(self.anchor_grids, dtype=torch.float32, device=device).to(dtype=dtype)
/home/mainak/ms/python/onnx_env/lib/python3.8/site-packages/yolort/models/anchor_utils.py:47: TracerWarning: torch.as_tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
  strides = torch.as_tensor(self.strides, dtype=torch.float32, device=device).to(dtype=dtype)
/home/mainak/ms/python/onnx_env/lib/python3.8/site-packages/yolort/models/box_head.py:406: TracerWarning: torch.as_tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
  strides = torch.as_tensor(self.strides, dtype=torch.float32, device=device).to(dtype=dtype)
/home/mainak/ms/python/onnx_env/lib/python3.8/site-packages/yolort/models/box_head.py:337: TracerWarning: Iterating over a tensor might cause the trace to be incorrect. Passing a tensor of different shape won't change the number of iterations executed (and might lead to errors or silently give incorrect results).
  for head_output, grid, shift, stride in zip(head_outputs, grids, shifts, strides):

Is this normal? Then when I'm loading with opencv dnn readNetFromONNX module the following error is coming:

[ERROR:0@0.522] global onnx_importer.cpp:2588 parseShape DNN/ONNX(Shape): dynamic 'zero' shapes are not supported, input image [ 3 0 0 ]
[ERROR:0@0.524] global onnx_importer.cpp:1061 handleNode DNN/ONNX: ERROR during processing node with 1 inputs and 1 outputs: [Shape]:(onnx_node!/Shape) from domain='ai.onnx'

I am having best.pt which is obtained from training(best mAP). In order to convert this weights I'm changing the path_ultralytics_yolov5 in helper.py. Is this the correct way because when I open the onnx in Netron it's architecture is different.

zhiqwang commented 9 months ago

Sorry for the delay in replying to you @mr-mainak , The whole workflow looks fine, the onnx we export now is too dynamic to be used by opencv.dnn, and perhaps we should need to trade off some dynamics to guarantee better generalizability.