Open dkloving opened 3 years ago
Hi @dkloving . The ONNX model exported by yolort
currently doesn't support fp16 (half) now. This will be a good enhancement, I will check how to implement this feature later, and contributions to this feature are welcome!
Hi @dkloving . The ONNX model exported by
yolort
currently doesn't support fp16 (half) now. This will be a good enhancement, I will check how to implement this feature later, and contributions to this feature are welcome!
Thanks, I am also trying to check on it. I'm not confident anymore about the information in my bug report. The issue may be caused elsewhere. I seem to have a lot to learn about how PyTorch exports to ONNX.
I have started empirically testing individual parts of the model that I can isolate. I can confidently say that YOLOTransform
and the postprocessing associated with it can be exported to ONNX models that 1) take fp16 inputs and produce fp16 outputs, and 2) produce the same results as their pytorch analogues.
I'm currently stuck on testing yolort.models.yolo.YOLO
(yolov5_darknet_pan_s_r31
). Using pytorch == 1.7.1
when I try to export only a yolov5_darknet_pan_s_r31
(rather than yolov5s
) I get the error that hardswich cannot be exported to ONNX. I guess the yolov5s
somehow intelligently replaces hardswish with the equivalent set of operations you discussed in the pytorch issue, but I haven't figured out yet how to make this happen on just yolov5_darknet_pan_s_r31
. Any pointers?
To export this piece of the model, I am doing;
from yolort.models.yolo import yolov5_darknet_pan_s_r31
model = yolov5_darknet_pan_s_r31(pretrained=False, progress=True, num_classes=2)
model = model.eval()
model = model.to(device)
model = model.half()
img_one = cv2.imread('test/bus.jpg')
img_one = read_image_to_tensor(img_one, is_half=True)
img_one = img_one.to(device)
images = torch.stack([img_one[:, :416, :320]])
from torchvision.ops._register_onnx_ops import _onnx_opset_version
export_onnx_name = 'yolomodel.onnx'
torch.onnx.export(
model,
(images,),
export_onnx_name,
do_constant_folding=True,
opset_version=_onnx_opset_version,
dynamic_axes={"images_tensors": [0, 1, 2], "outputs": [0, 1, 2]},
input_names=["images_tensors"],
output_names=["outputs"],
)
@dkloving torch.onnx.export 1.7.1 doesn't support torch.Hardswish
, that is why we introduce the parameter export_friendly
in https://github.com/zhiqwang/yolov5-rt-stack/blob/97c8ab79642208925e31f3e844570562bd743ef9/yolort/models/__init__.py#L27-L28
you should do something like the following:
model = yolov5_darknet_pan_s_r31(pretrained=False, progress=True, num_classes=2)
_export_module_friendly(model)
model = model.eval()
I guess the yolov5s somehow intelligently replaces hardswish with the equivalent set of operations you discussed in the pytorch issue, but I haven't figured out yet how to make this happen on just
yolov5_darknet_pan_s_r31
Yes, we use the above parameter export_friendly
to replace torch.Hardswich
with the following function.
https://github.com/zhiqwang/yolov5-rt-stack/blob/97c8ab79642208925e31f3e844570562bd743ef9/yolort/utils/activations.py#L19-L27
Another option is to update PyTorch to 1.8.1, which natively supports exporting torch.Hardswish
to ONNX.
Thanks, I was able to make progress simply by updating to Pytorch 1.8.1 as you suggested, but also it was helpful for me to look at the export-friendly code.
I was wrong in my initial bug report. The error occurs not on onnx_model = onnx.load(export_onnx_name)
but instead at either:
model_simp, check = onnxsim.simplify(
onnx_model,
input_shapes={"images_tensors": [3, 640, 640]},
dynamic_input_shape=True,
)
or
ort_session = onnxruntime.InferenceSession(export_onnx_name)
Also, when I am exporting just yolov5_darknet_pan_s_r31
I see the same problem (mixing float and float16) but at a different node:
Fail: [ONNXRuntimeError] : 1 : FAIL : Type Error: Type parameter (T) bound to different types (tensor(float) and tensor(float16) in node (Conv_52).
Another update. I can confirm that the problem is in exporting the Darknet backbone. The following code produces an Onnx model that does not behave correctly. This actually will allow you to create an onnxruntime inference session and run inference, but its outputs are float32
when they should be float16
(pytorch model outputs are correctly fp16
). Including a second layer (or more) produces the error.
model = DarkNet(depth_multiple=0.33, width_multiple=0.5, version='r3.1', num_classes=2)
model = model.eval()
model = model.to(device)
model = model.half()
# isolate single layer of features
model = model.features[0]
Inspecting the onnx file with Netron shows that the single Conv
block does indeed have float16
weights and biases, but for some reason its output is being cast as int64
before add
, clip
, and div
, and mul
are applied.
It looks like this is an issue with the onnx converter itself. I am still investigating a fix or workaround.
One issue seems to be with the conversion of Hardswish
to onnx. By using ultraltyics v4.0
we change to SiLU in the backbone and so darknet_pan_backbone('darknet_s_r4_0', 0.33, 0.5, version='r4.0')
appears to export to a valid onnx model that can be used for inference. However, yolov5_darknet_pan_s_r40(pretrained=False, progress=True, num_classes=2)
does not yet.
Hi @dkloving
This is because PyTorch doesn't currently support exporting the torch.SiLU
to ONNX. Following is a friendly substitute of torch.SiLU
for ONNX:
https://github.com/zhiqwang/yolov5-rt-stack/blob/218c428c7fc2310d6a4014e594c7b1b0a7171b33/yolort/utils/activations.py#L6-L16
darknet_pan_backbone
appears to export to a valid onnx model that can be used for inference. However,yolov5_darknet_pan_s_r40
does not yet.
Both darknet_pan_backbone
and yolov5_darknet_pan_s_r40
seem to use torch.SiLU
, so the error here is a bit strange.
Thanks @zhiqwang. It looks like pytorch has added support for SiLU. I exported a DarkNet(..., version='r4.0')
model to onnx and checked with Netron. It shows sigmoid
and mul
nodes just as I would expect from your version above.
Export to onnx fp16 is still not working. The exported version of torchvision.ops.batched_nms
as of v0.9.1
requires fp32 inputs for boxes and scores. We could patch PostProcess
to cast them to fp32 when sending to batched_nms
but this would get in the way of users who simply want to use pytorch fp16, not onnx.
Tracking down the issue with torchvision is driving me bonkers. Somehow copy-pasting the code from here for example gives me a working exportable fp16 PostProcess
model, but it's calling nms
which doesnt even seem to be imported or defined in my namespace and I can't figure out how it's even using that without thowing a NameError
. When I call nms
on my own I do get a NameError
as I would expect. This is really weird.
Hi @dkloving ,
Seems that torchvision's NMS
doesn't support the FP16 mode, maybe we should work on the CPP source codes to address this problem.
A more practical route is that we could also separate PostProcess
from the yolov5_darknet_pan_s_r40
, in other words, we can make PostProcess
as Optional
in yolov5_darknet_pan_s_r31
to avoid exporting the PostProcess
module. And then implement a FP16 version of PostProcess
using ONNXRuntime or something else.
Actually the PostProcess
has caused problems for subsequent applications, such as the discussion in #99 .
Edited: [Maybe I'm wrong here, check the following comment.] ~I will work for this separation in the next two days,~ if you have any interest in this, we also welcome you to submit this feature.
Hi @dkloving ,
Actually the PostProcess
doesn't contains any weights or bias as below.
And the torchvision's nms
won't work on ONNX, the modification of torchvision's nms will not take effect on the ONNX side. ORT implement and use the NonMaxSuppression, CPU and NonMaxSuppression, CUDA instead.
So we should do two things
FP16
mode or not?symbolic_multi_label_nms
as follows) to export the "fake" FP16 batched_nms
to ORT's FP16
version NonMaxSuppression.FYI, torchvision is using the symbolic_multi_label_nms
to export the NMS
to ONNX.
A temporary workaround for anyone who needs it is to force fp32 for post-processing by wrapping a yolort model like so:
class YoloMain(nn.Module):
def __init__(self, yolo_complete):
super().__init__()
self.model = yolo_complete
self.post_process = PostProcess(0.01, 0.8, 300)
def forward(self, X):
backbone_out = self.model.backbone(X)
head_out = [t.to(torch.float32) for t in self.model.head(backbone_out)]
anchors_out = [t.to(torch.float32) for t in self.model.anchor_generator(backbone_out)]
detections = self.post_process(head_out, anchors_out)
return detections
Hi @dkloving . The ONNX model exported by
yolort
currently doesn't support fp16 (half) now. This will be a good enhancement, I will check how to implement this feature later, and contributions to this feature are welcome!
@zhiqwang Does it improve now?
🐛 Bug
When exporting a half precision (fp16) model to onnx it creates an invalid onnx file. This appears to be because of a node that remains in fp32 as a result of this line in
torch.nn.functional.interpolate
To Reproduce (REQUIRED)
Steps to reproduce the behavior:
model = model.to(device)
add the linemodel = model.half()
torch.onnx.export(...)
. Error will occur atonnx_model = onnx.load(export_onnx_name)
Relevant warnings on export appears to be:
Error on loading onnx model is:
Expected behavior
Successful execution of tutorial notebook when model is converted to half precision.
Environment
[pip3] numpy==1.19.2 [pip3] pytorch-lightning==1.3.0rc1 [pip3] torch==1.7.1 [pip3] torchaudio==0.7.0a0+a853dff [pip3] torchmetrics==0.3.2 [pip3] torchvision==0.8.2 [conda] blas 1.0 mkl [conda] cudatoolkit 10.2.89 hfd86e86_1 [conda] mkl 2020.2 256 [conda] mkl-service 2.3.0 py37he8ac12f_0 [conda] mkl_fft 1.3.0 py37h54f3939_0 [conda] mkl_random 1.1.1 py37h0573a6f_0 [conda] numpy 1.19.2 py37h54aff64_0 [conda] numpy-base 1.19.2 py37hfa32c7d_0 [conda] pytorch 1.7.1 py3.7_cuda10.2.89_cudnn7.6.5_0 pytorch [conda] pytorch-lightning 1.3.0rc1 pypi_0 pypi [conda] torchaudio 0.7.2 py37 pytorch [conda] torchmetrics 0.3.2 pypi_0 pypi [conda] torchvision 0.8.2 py37_cu102 pytorch
Additional context
It looks like a pytorch issue but I'm not sure how we are using this interpolate function. Perhaps we can find a workaround?