ultralytics / yolov5

YOLOv5 🚀 in PyTorch > ONNX > CoreML > TFLite
https://docs.ultralytics.com
GNU Affero General Public License v3.0
49.27k stars 16.04k forks source link

Bounding Box Regression Equation #4373

Closed developer0hye closed 3 years ago

developer0hye commented 3 years ago

❔Question

Hi, @glenn-jocher !

I have one question.

image

As you know, the bounding box regression equation of yolov5 is different from yolov3 and yolov4. However, it is the same with scaled-yolov4. Why do you replace the equation for predicting the width and height of the bounding box?

It seems that yolov5 and scaled-yolov4's equation are more numerically stable than the old ones especially when we train the model with mixed precision.

Additional context

mxy5201314 commented 3 years ago

Get more positive samples, accelerate convergence and be more stable

developer0hye commented 3 years ago

@mxy5201314 Thanks for your reply! How do we get more positive samples with changing the equation?

mxy5201314 commented 3 years ago

in yolov5, bx belong to (-0.5 + cx) to (1.5 + cx), ~ , bw belong to 0 to (4 * pw), ~, In these areas are positive samples, You can compare other versions.

developer0hye commented 3 years ago

@mxy5201314 Okay, I could understand your great explanation!

We can assign more positive samples during training with changing the equation for localization(bx, by).

But I couldn't understand why the equation for bw and bh is changed for more positive samples.

mxy5201314 commented 3 years ago

@developer0hye In fact, the gradient explosion can be prevented without exp, and the convergence can be accelerated by changing bw. Imagine that if tw is too large when using exp, it may exceed the aspect ratio threshold. If tw is too small (less than 0), its gradient is also very small (because it is exp), which is not conducive to training. I think this change is mainly the optimization that the author thinks.

developer0hye commented 3 years ago

@mxy5201314 Thanks! I agree with you. Exp operation can cause gradient explosion!

glenn-jocher commented 3 years ago

@developer0hye @mxy5201314 this is because YOLOv4-scaled is based upon 99% of the YOLOv5 codebase (including my box regression equation that you ask about above). The authors completely fail to mention this as they want to present it as their own work, and they are using YOLOv5 augmentations, loss function, regression equation, autoanchor, etc. with zero citation or credit to myself or Ultralytics.

So the box regression equation above I created myself for YOLOv5 in May 2020, it features increased stability during early training due to it's bounded limits, in addition to leveraging a single Sigmoid output on all neurons rather than exp() on some and sigmoid() on others, creating a simpler architecture for YOLOv5 Detect() layer, and then later on YOLOv4-scaled came around and copied my work, stripping all comments, readme's etc that would mention YOLOv5, which is why my box regression equation is in YOLOv4-scaled now as well.

mxy5201314 commented 3 years ago

@glenn-jocher I trust you and your yolov5 is great.

developer0hye commented 3 years ago

@mxy5201314 I think so... @glenn-jocher is great.

glenn-jocher commented 3 years ago

@developer0hye the compound scaling in YOLOv5 is based very closely on the compound scaling concepts (image size, width, depth) first introduced by EfficientDet https://arxiv.org/abs/1911.09070, though I disconnected the image-size from the other two.

The authors say their scaling concepts were inspired by their earlier work [39] Mingxing Tan and Quoc V. Le. Efficientnet: Rethinking model scaling for convolutional neural networks. ICML, 2019

developer0hye commented 3 years ago

@glenn-jocher Thanks for your great explanation!

WongKinYiu commented 3 years ago

The authors completely fail to mention this as they want to present it as their own work, and they are using YOLOv5 augmentations, loss function, regression equation, autoanchor, etc. with zero citation or credit to myself or Ultralytics.

Hello,

We greatly thanks for your wonderful works, and mentioned it in ArXiv, paper, and github... We do not use autoanchor in scaled-YOLOv4 and the loss function is CIoU which is proposed by Zheng et al. Because all of these credit are yours (mosaic augmentation, hyper-parameter evolution, bounding box regression equation), we never claim these parts are our contributions. (Mosaic augmentation, hyper-parameter evolution, bounding box regression equation.) image (About 50% are from darknet, 30% are from yolov3, and 5~10% are from here.) image (github, corresponding to acknowledgements in paper.) image (github, shown in first line of readme.) image

By the way, I do not know why you say

... , and then later on YOLOv4-scaled came around and copied my work, ...

while you actually know we develop yolov4-csp and trained the models by your yolov3 code before this repository public released (May 27, 2020.).

image

Scaled-YOLOv4 is originally developing on darknet and your yolov3 codebase, and due to this codebase is really wonderful and easy to use for everyone, we then later implement scaled-YOLOv4 based on this repository. And we will keep mention your great effort and give those credit to you if we use the functions. In actually, I always put the reference link on my report due to I think a large amount discussion are not only from us but also many githubers.

image image image image

Zzh-tju commented 3 years ago

It is very common that people propose several tricks A, B, C based on the model D, and then rename D to a new model E which claims to be SOTA. I also use mmdetection, in which more than 99% source codes are not proposed by me. But it doesn't affect that I raname the model to a new SOTA one. Correct and necessary citations and acknowledgements are enough to express my respect to these researchers.

jimmyflycv commented 2 years ago

@developer0hye @mxy5201314 this is because YOLOv4-scaled is based upon 99% of the YOLOv5 codebase (including my box regression equation that you ask about above). The authors completely fail to mention this as they want to present it as their own work, and they are using YOLOv5 augmentations, loss function, regression equation, autoanchor, etc. with zero citation or credit to myself or Ultralytics.

So the box regression equation above I created myself for YOLOv5 in May 2020, it features increased stability during early training due to it's bounded limits, in addition to leveraging a single Sigmoid output on all neurons rather than exp() on some and sigmoid() on others, creating a simpler architecture for YOLOv5 Detect() layer, and then later on YOLOv4-scaled came around and copied my work, stripping all comments, readme's etc that would mention YOLOv5, which is why my box regression equation is in YOLOv4-scaled now as well.

Totally agree.