tinyvision / DAMO-YOLO

DAMO-YOLO: a fast and accurate object detection method with some new techs, including NAS backbones, efficient RepGFPN, ZeroHead, AlignedOTA, and distillation enhancement.
Apache License 2.0
3.79k stars 476 forks source link

Difference in Output Between PyTorch and OpenVINO #149

Closed ofekp closed 3 months ago

ofekp commented 4 months ago

Before Asking

Search before asking

Question

Given a finetuned DAMO-YOLO model, I want to convert it to OpenVINO. I am using the provided converter script. However, I am using OpenVINO's ovc (not mo as suggested in the README, I believe mo is deprecated). My problem is that the outputs from OpenVINO are different. Given the same exact input I get that torch and onnx outputs are practically the same but the output of the OpenVINO model is slightly different. I am noting that I compared all the models (torch, onnx, OpenVINO) with the post-processing (i.e. when I used the converter script, I did not add the --bench flag). Given that the input image size is 416x416 after post-processing that output has 3549 predictions with the nano-large model. To give more information about what I see, if I order that outputs of both torch and OpenVINO based on the class score the outputs become closer, albeit still not ordered perfectly. In fact, the order of the predictions with the highest scores are vastly different between both outputs. Finally, for this specific image, the final result after nms is similar (based on a visualised image with bboxes). That being said, this is only one image, and I expected OpenVINO and torch outputs to be much closer.

Is this behavior expected? In your tests, have you experienced any accuracy degradation after converting the model to OpenVINO? If you believe the outputs should be closer, what do you think I am missing, please?

Thank you very much for any help.

Additional

No response

ofekp commented 3 months ago

I figured out that the input was different due to a difference between PIL and CV2 when loading an image. More specifically, for PIL version 9.4.0, the JPEG decoders differ from those used by CV2, leading to slightly different inputs. After fixing this issue, the outputs from the models (PyTorch vs OpenVINO) are now much closer.