hhk7734 / tensorflow-yolov4

YOLOv4 Implemented in Tensorflow 2.
MIT License
136 stars 75 forks source link

Question about 'txty' and 'bxby' #50

Closed jingting9 closed 3 years ago

jingting9 commented 3 years ago

Hi, I have two questions about "txty_s = (txty_s - self.a_half[0]) * self.scales[0] + self.a_half[0]" in head.py

  1. self.a_half is filled with 0.5, why minus 0.5 from txty_s, and later plus 0.5 again?
  2. what does self.scales mean? I could not find any clues for it. bxby represents the value relative to image size?
hhk7734 commented 3 years ago

self.scales is xyscale.

https://github.com/hhk7734/tensorflow-yolov4/blob/18426616718b66f22e6871437a426fcc632eb632/py_src/yolov4/common/base_class.py#L60-L63

https://github.com/hhk7734/tensorflow-yolov4/blob/18426616718b66f22e6871437a426fcc632eb632/py_src/yolov4/common/base_class.py#L143-L158

The description below may be wrong because it is what I guessed while creating this package.

If input_size is 416 * 416, because strides are 8, 16, and 32, the sizes of the predicted grids are 52 * 52, 26 * 26 and 13 * 13.

There are 3 anchors in each grid, in order [[12, 16], [19, 36], [40, 28]], [[36, 75], [76, 55], [72, 146]], [[142, 110], [192, 243], [459, 401]].

https://github.com/hhk7734/tensorflow-yolov4/blob/18426616718b66f22e6871437a426fcc632eb632/py_src/yolov4/common/base_class.py#L43-L53

The small size of the anchor means that small objects are detected.

Small size objects are detected in the 52*52 size grid.

Suppose a small object center is (240.4, 240.4) in a 416 * 416 size image when xyscale is not used. When the object is detected, txty_s is (0.05, 0.05) and coordinate is (30, 30), bxby_s is ((30 + 0.05) / 52, (30 + 0.05) / 52).

But if it uses xyscale. the object is detected at 4 points. (29, 29) + (0.9583, 0.9583), (29, 30) + (0.9583, 0.125), (30, 29) + (0.125, 0.9583), (30, 30) + (0.125, 0.125)

29 + (0.9583 - 0.5) * 1.2 + 0.5 == 30 + (0.125 - 0.5) * 1.2 + 0.5 == 30.05

Because AI is probability, an object near an edge can be found on both sides of the edge. I think using xyscale improves the inference a little more at the corners.

jingting9 commented 3 years ago

Thank you for your reply! I sort of understand what you mean. Still don't know how xyscale came from. Is it set by yourself or computed by some methods?

hhk7734 commented 3 years ago

In darknet: https://github.com/AlexeyAB/darknet/blob/95339f2df57624ae1b27a560ff643e49eff238eb/cfg/yolov4.cfg#L973

jingting9 commented 3 years ago

https://github.com/AlexeyAB/darknet/blob/95339f2df57624ae1b27a560ff643e49eff238eb/cfg/yolov4.cfg#L973

Thank you for your patient :)