zhiqwang / yolort

yolort is a runtime stack for yolov5 on specialized accelerators such as tensorrt, libtorch, onnxruntime, tvm and ncnn.
https://zhiqwang.com/yolort
GNU General Public License v3.0
716 stars 153 forks source link

Exporting fp16 model to onnx produces invalid onnx model #107

Open dkloving opened 3 years ago

dkloving commented 3 years ago

🐛 Bug

When exporting a half precision (fp16) model to onnx it creates an invalid onnx file. This appears to be because of a node that remains in fp32 as a result of this line in torch.nn.functional.interpolate

To Reproduce (REQUIRED)

Steps to reproduce the behavior:

  1. Open tutorial "export-onnx-inference-onnxruntime" notebook.
  2. In the third code box, after model = model.to(device) add the line model = model.half()
  3. Continue running notebook code. Warning below will occur at torch.onnx.export(...). Error will occur at onnx_model = onnx.load(export_onnx_name)

Relevant warnings on export appears to be:

/home/david/.conda/envs/pytorch/lib/python3.7/site-packages/torch/nn/functional.py:3123: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
  dtype=torch.float32)).float())) for i in range(dim)]

Error on loading onnx model is:

Fail                                      Traceback (most recent call last)
<ipython-input-23-0c5db6c3f5a7> in <module>
      6     onnx_model,
      7     input_shapes={"images_tensors": [3, 640, 640]},
----> 8     dynamic_input_shape=True,
      9 )
     10 

~/.conda/envs/pytorch/lib/python3.7/site-packages/onnxsim/onnx_simplifier.py in simplify(model, check_n, perform_optimization, skip_fuse_bn, input_shapes, skipped_optimizers, skip_shape_inference, input_data, dynamic_input_shape, custom_lib)
    478         return model
    479 
--> 480     model = fixed_point(model, infer_shapes_and_optimize, constant_folding)
    481 
    482     # Overwrite model input shape

~/.conda/envs/pytorch/lib/python3.7/site-packages/onnxsim/onnx_simplifier.py in fixed_point(x, func_a, func_b)
    379     """
    380     x = func_a(x)
--> 381     x = func_b(x)
    382     while True:
    383         y = func_a(x)

~/.conda/envs/pytorch/lib/python3.7/site-packages/onnxsim/onnx_simplifier.py in constant_folding(model)
    472                                        input_shapes=updated_input_shapes,
    473                                        input_data=input_data,
--> 474                                        custom_lib=custom_lib)
    475         const_nodes = clean_constant_nodes(const_nodes, res)
    476         model = eliminate_const_nodes(model, const_nodes, res)

~/.conda/envs/pytorch/lib/python3.7/site-packages/onnxsim/onnx_simplifier.py in forward_for_node_outputs(model, nodes, input_shapes, input_data, custom_lib)
    227                   input_data=input_data,
    228                   input_shapes=input_shapes,
--> 229                   custom_lib=custom_lib)
    230     return res
    231 

~/.conda/envs/pytorch/lib/python3.7/site-packages/onnxsim/onnx_simplifier.py in forward(model, input_data, input_shapes, custom_lib)
    193     sess_options.log_severity_level = 3
    194     sess = rt.InferenceSession(model.SerializeToString(
--> 195     ), sess_options=sess_options, providers=['CPUExecutionProvider'])
    196 
    197     input_names = get_input_names(model)

~/.conda/envs/pytorch/lib/python3.7/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py in __init__(self, path_or_bytes, sess_options, providers, provider_options)
    278 
    279         try:
--> 280             self._create_inference_session(providers, provider_options)
    281         except RuntimeError:
    282             if self._enable_fallback:

~/.conda/envs/pytorch/lib/python3.7/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py in _create_inference_session(self, providers, provider_options)
    307             sess = C.InferenceSession(session_options, self._model_path, True, self._read_config_from_model)
    308         else:
--> 309             sess = C.InferenceSession(session_options, self._model_bytes, False, self._read_config_from_model)
    310 
    311         # initialize the C++ InferenceSession

Fail: [ONNXRuntimeError] : 1 : FAIL : Type Error: Type parameter (T) bound to different types (tensor(float) and tensor(float16) in node (Concat_929).

Expected behavior

Successful execution of tutorial notebook when model is converted to half precision.

Environment

[pip3] numpy==1.19.2 [pip3] pytorch-lightning==1.3.0rc1 [pip3] torch==1.7.1 [pip3] torchaudio==0.7.0a0+a853dff [pip3] torchmetrics==0.3.2 [pip3] torchvision==0.8.2 [conda] blas 1.0 mkl [conda] cudatoolkit 10.2.89 hfd86e86_1 [conda] mkl 2020.2 256 [conda] mkl-service 2.3.0 py37he8ac12f_0 [conda] mkl_fft 1.3.0 py37h54f3939_0 [conda] mkl_random 1.1.1 py37h0573a6f_0 [conda] numpy 1.19.2 py37h54aff64_0 [conda] numpy-base 1.19.2 py37hfa32c7d_0 [conda] pytorch 1.7.1 py3.7_cuda10.2.89_cudnn7.6.5_0 pytorch [conda] pytorch-lightning 1.3.0rc1 pypi_0 pypi [conda] torchaudio 0.7.2 py37 pytorch [conda] torchmetrics 0.3.2 pypi_0 pypi [conda] torchvision 0.8.2 py37_cu102 pytorch

Additional context

It looks like a pytorch issue but I'm not sure how we are using this interpolate function. Perhaps we can find a workaround?

zhiqwang commented 3 years ago

Hi @dkloving . The ONNX model exported by yolort currently doesn't support fp16 (half) now. This will be a good enhancement, I will check how to implement this feature later, and contributions to this feature are welcome!

dkloving commented 3 years ago

Hi @dkloving . The ONNX model exported by yolort currently doesn't support fp16 (half) now. This will be a good enhancement, I will check how to implement this feature later, and contributions to this feature are welcome!

Thanks, I am also trying to check on it. I'm not confident anymore about the information in my bug report. The issue may be caused elsewhere. I seem to have a lot to learn about how PyTorch exports to ONNX.

I have started empirically testing individual parts of the model that I can isolate. I can confidently say that YOLOTransform and the postprocessing associated with it can be exported to ONNX models that 1) take fp16 inputs and produce fp16 outputs, and 2) produce the same results as their pytorch analogues.

I'm currently stuck on testing yolort.models.yolo.YOLO (yolov5_darknet_pan_s_r31). Using pytorch == 1.7.1 when I try to export only a yolov5_darknet_pan_s_r31 (rather than yolov5s) I get the error that hardswich cannot be exported to ONNX. I guess the yolov5s somehow intelligently replaces hardswish with the equivalent set of operations you discussed in the pytorch issue, but I haven't figured out yet how to make this happen on just yolov5_darknet_pan_s_r31. Any pointers?

To export this piece of the model, I am doing;

from yolort.models.yolo import yolov5_darknet_pan_s_r31
model = yolov5_darknet_pan_s_r31(pretrained=False, progress=True, num_classes=2)
model = model.eval()
model = model.to(device)
model = model.half()

img_one = cv2.imread('test/bus.jpg')
img_one = read_image_to_tensor(img_one, is_half=True)
img_one = img_one.to(device)
images = torch.stack([img_one[:, :416, :320]])

from torchvision.ops._register_onnx_ops import _onnx_opset_version
export_onnx_name = 'yolomodel.onnx'
torch.onnx.export(
    model,
    (images,),
    export_onnx_name,
    do_constant_folding=True,
    opset_version=_onnx_opset_version,
    dynamic_axes={"images_tensors": [0, 1, 2], "outputs": [0, 1, 2]}, 
    input_names=["images_tensors"],
    output_names=["outputs"],
)
zhiqwang commented 3 years ago

@dkloving torch.onnx.export 1.7.1 doesn't support torch.Hardswish, that is why we introduce the parameter export_friendly in https://github.com/zhiqwang/yolov5-rt-stack/blob/97c8ab79642208925e31f3e844570562bd743ef9/yolort/models/__init__.py#L27-L28 you should do something like the following:

model = yolov5_darknet_pan_s_r31(pretrained=False, progress=True, num_classes=2)
_export_module_friendly(model)
model = model.eval()

I guess the yolov5s somehow intelligently replaces hardswish with the equivalent set of operations you discussed in the pytorch issue, but I haven't figured out yet how to make this happen on just yolov5_darknet_pan_s_r31

Yes, we use the above parameter export_friendly to replace torch.Hardswich with the following function. https://github.com/zhiqwang/yolov5-rt-stack/blob/97c8ab79642208925e31f3e844570562bd743ef9/yolort/utils/activations.py#L19-L27

Another option is to update PyTorch to 1.8.1, which natively supports exporting torch.Hardswish to ONNX.

dkloving commented 3 years ago

Thanks, I was able to make progress simply by updating to Pytorch 1.8.1 as you suggested, but also it was helpful for me to look at the export-friendly code.

I was wrong in my initial bug report. The error occurs not on onnx_model = onnx.load(export_onnx_name) but instead at either:

model_simp, check = onnxsim.simplify(
    onnx_model,
    input_shapes={"images_tensors": [3, 640, 640]},
    dynamic_input_shape=True,
)

or

ort_session = onnxruntime.InferenceSession(export_onnx_name)

Also, when I am exporting just yolov5_darknet_pan_s_r31 I see the same problem (mixing float and float16) but at a different node:

Fail: [ONNXRuntimeError] : 1 : FAIL : Type Error: Type parameter (T) bound to different types (tensor(float) and tensor(float16) in node (Conv_52).
dkloving commented 3 years ago

Another update. I can confirm that the problem is in exporting the Darknet backbone. The following code produces an Onnx model that does not behave correctly. This actually will allow you to create an onnxruntime inference session and run inference, but its outputs are float32 when they should be float16 (pytorch model outputs are correctly fp16). Including a second layer (or more) produces the error.

model = DarkNet(depth_multiple=0.33, width_multiple=0.5, version='r3.1', num_classes=2)
model = model.eval()
model = model.to(device)
model = model.half()

# isolate single layer of features
model = model.features[0]

Inspecting the onnx file with Netron shows that the single Conv block does indeed have float16 weights and biases, but for some reason its output is being cast as int64 before add, clip, and div, and mul are applied.

It looks like this is an issue with the onnx converter itself. I am still investigating a fix or workaround.

dkloving commented 3 years ago

One issue seems to be with the conversion of Hardswish to onnx. By using ultraltyics v4.0 we change to SiLU in the backbone and so darknet_pan_backbone('darknet_s_r4_0', 0.33, 0.5, version='r4.0') appears to export to a valid onnx model that can be used for inference. However, yolov5_darknet_pan_s_r40(pretrained=False, progress=True, num_classes=2) does not yet.

zhiqwang commented 3 years ago

Hi @dkloving

This is because PyTorch doesn't currently support exporting the torch.SiLU to ONNX. Following is a friendly substitute of torch.SiLU for ONNX: https://github.com/zhiqwang/yolov5-rt-stack/blob/218c428c7fc2310d6a4014e594c7b1b0a7171b33/yolort/utils/activations.py#L6-L16

darknet_pan_backbone appears to export to a valid onnx model that can be used for inference. However, yolov5_darknet_pan_s_r40 does not yet.

Both darknet_pan_backbone and yolov5_darknet_pan_s_r40 seem to use torch.SiLU, so the error here is a bit strange.

dkloving commented 3 years ago

Thanks @zhiqwang. It looks like pytorch has added support for SiLU. I exported a DarkNet(..., version='r4.0') model to onnx and checked with Netron. It shows sigmoid and mul nodes just as I would expect from your version above.

dkloving commented 3 years ago

Export to onnx fp16 is still not working. The exported version of torchvision.ops.batched_nms as of v0.9.1 requires fp32 inputs for boxes and scores. We could patch PostProcess to cast them to fp32 when sending to batched_nms but this would get in the way of users who simply want to use pytorch fp16, not onnx.

Tracking down the issue with torchvision is driving me bonkers. Somehow copy-pasting the code from here for example gives me a working exportable fp16 PostProcess model, but it's calling nms which doesnt even seem to be imported or defined in my namespace and I can't figure out how it's even using that without thowing a NameError. When I call nms on my own I do get a NameError as I would expect. This is really weird.

zhiqwang commented 3 years ago

Hi @dkloving ,

Seems that torchvision's NMS doesn't support the FP16 mode, maybe we should work on the CPP source codes to address this problem.

A more practical route is that we could also separate PostProcess from the yolov5_darknet_pan_s_r40, in other words, we can make PostProcess as Optional in yolov5_darknet_pan_s_r31 to avoid exporting the PostProcess module. And then implement a FP16 version of PostProcess using ONNXRuntime or something else.

Actually the PostProcess has caused problems for subsequent applications, such as the discussion in #99 .

Edited: [Maybe I'm wrong here, check the following comment.] ~I will work for this separation in the next two days,~ if you have any interest in this, we also welcome you to submit this feature.

zhiqwang commented 3 years ago

Hi @dkloving ,

Actually the PostProcess doesn't contains any weights or bias as below.

https://github.com/zhiqwang/yolov5-rt-stack/blob/b0af4a1b17805543f415df705deb66f398b10170/yolort/models/box_head.py#L313-L334

And the torchvision's nms won't work on ONNX, the modification of torchvision's nms will not take effect on the ONNX side. ORT implement and use the NonMaxSuppression, CPU and NonMaxSuppression, CUDA instead.

So we should do two things

  1. Check that whether ORT NonMaxSuppression supports FP16 mode or not?
  2. If the answer of the first question is yes, we should implement some mechanisms (like the symbolic_multi_label_nms as follows) to export the "fake" FP16 batched_nms to ORT's FP16 version NonMaxSuppression.

FYI, torchvision is using the symbolic_multi_label_nms to export the NMS to ONNX.

dkloving commented 3 years ago

A temporary workaround for anyone who needs it is to force fp32 for post-processing by wrapping a yolort model like so:

class YoloMain(nn.Module):
    def __init__(self, yolo_complete):
        super().__init__()
        self.model = yolo_complete
        self.post_process = PostProcess(0.01, 0.8, 300)

    def forward(self, X):
        backbone_out = self.model.backbone(X)
        head_out = [t.to(torch.float32) for t in self.model.head(backbone_out)]
        anchors_out = [t.to(torch.float32) for t in self.model.anchor_generator(backbone_out)]
        detections = self.post_process(head_out, anchors_out)        
        return detections
MrRace commented 2 years ago

Hi @dkloving . The ONNX model exported by yolort currently doesn't support fp16 (half) now. This will be a good enhancement, I will check how to implement this feature later, and contributions to this feature are welcome!

@zhiqwang Does it improve now?