yasenh / libtorch-yolov5

A LibTorch inference implementation of the yolov5
MIT License
372 stars 114 forks source link

Python and libtorch model prediction results are inconsistent #37

Closed blueskywwc closed 3 years ago

blueskywwc commented 3 years ago

Hello, I have updated the version of YOLOv5 (4.0). I found that the prediction results of the python model are a little different from the results predicted by the libtorch model. The prediction results of the 3.1 version are the same. What is the reason? Can you help me, thank you!

yasenh commented 3 years ago

@blueskywwc I didn't try the latest yolov5 version(4.0), did you export new torchscript models?

blueskywwc commented 3 years ago

Yes, I exported a new model, I used all versions of yolov5s.pt, the same training parameters, but the size of the trained model is inconsistent: 14.6M(v4.0) 15.0M(v3.1) The model structure of the new version has changed, so the output of libtorch does not match the prediction result of v4.0. I am not sure where to modify it. I want to ask for your help, thank you!

xuebuaa commented 3 years ago

@blueskywwc Have you solved this problem,with the same confidence and iou threshold ,libtorch version predicted very different result,but as I tried to input a 1x1x640x640 tensor initialize with 1.0 into python model and libtorch model,output tensor is consistent,is there any post processing procedure changed ?

blueskywwc commented 3 years ago

@xuebuaa I have solved this problem ,You need to rewrite LetterboxImage to be consistent with the python version ,this is the result of my rewriting:

vector Detector::LetterboxImage(const cv::Mat& src, cv::Mat& dst, const cv::Size& input_size) { auto src_h = static_cast(src.rows); auto src_w = static_cast(src.cols);

float in_h = input_size.height;
float in_w = input_size.width;

float scale = min(in_w / src_w, in_h / src_h);

int mid_h = static_cast<int>(round(src_h * scale));
int mid_w = static_cast<int>(round(src_w * scale));

int dw=in_w-mid_w;
int dh=in_h-mid_h;
float p_w = dw%32/2.0;
float p_h = dh%32/2.0;

cv::resize(src, dst, cv::Size(mid_w, mid_h));

int top = (int)round(p_h-0.1) ;
int bottom = (int)round(p_h+0.1);
int left = (int)round(p_w-0.1);
int right = (int)round(p_w+0.1);

cv::copyMakeBorder(dst, dst, top, bottom, left, right, cv::BORDER_CONSTANT, cv::Scalar(114, 114, 114));

vector<float> pad_info{static_cast<float>(left), static_cast<float>(top), scale};
return pad_info;

}

xuebuaa commented 3 years ago

@blueskywwc ,thanks for replying, there really exists some difference comparing to python version letterbox function,which used for pading image;I tried yours with input size 2592x1944,but the output size is 640x480,not fitting the required size 640x640;While debuging, the parameter top and bottom in fact is zero for cv::copyMakeBorder ;And I don't really understand why letterbox use Mod 32 to calculate the padding height and width. image

blueskywwc commented 3 years ago

@xuebuaa This article will answer your questions : https://blog.csdn.net/nan355655600/article/details/107852353

xuebuaa commented 3 years ago

@blueskywwc 好的多谢!

hojinchang commented 1 year ago

Hello!

Thank you yasenh for your implementation of yolov5! My response is a little late but I hope to help anyone facing this issue in the future.

I have also noticed some inconsistencies in the model predictions between your implementation and Ultralytic's Python version. In my case, some predictions made in the Python version were missed in this c++ implementation.

I believe the issue is in the LetterBox function.

In the Python version, it uses both INTER_AREA and INTER_LINEAR interpolation methods based on the image size ratio. In contrast, the c++ implementation only uses INTER_LINEAR by default. Changing this to match the Python code fixed my issue.