Possible mistake when calculating translation matrix

i-a-sivkov commented 2 years ago

In function random_affine from datasets.py (https://github.com/ultralytics/xview-yolov3/blob/master/utils/datasets.py), if I don't mistake, there is inaccuracy while calculating translation matrix: T[0, 2] = (random.random() * 2 - 1) * translate[0] * img.shape[0] + border # x translation (pixels) T[1, 2] = (random.random() * 2 - 1) * translate[1] * img.shape[1] + border # y translation (pixels)

We pass to random_affine function array with image derived from cv2.imread, so img.shape[0] is a height of image and img.shape[1] is a width and we should to multiple translation coeff. in first line by img.shape[1] and in second one img.shape[0].

Or did I miss something?

github-actions[bot] commented 2 years ago

Hello @i-a-sivkov, thank you for your interest in our work! Ultralytics has publicly released YOLOv5 at https://github.com/ultralytics/yolov5, featuring faster, lighter and more accurate object detection. YOLOv5 is recommended for all new projects.

If this is a 🐛 Bug Report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you.

If this is a custom training ❓ Question, please provide as much information as possible, including dataset images, training logs, screenshots, and a public link to online W&B logging if available.

For business inquiries or professional support requests please visit https://ultralytics.com or email Glenn Jocher at glenn.jocher@ultralytics.com.

glenn-jocher commented 2 years ago

@i-a-sivkov hello thanks for raising this issue regarding random_affine! Yes I see your point. In the following code im.shape is shape(h,w,c):

import cv2

im = cv2.imread('data/images/zidane.jpg')
print(im.shape)  # (720, 1280, 3)

zidane

random_affine() has now been replaced by random_perspective() in YOLOv5, where the above error has been fixed: https://github.com/ultralytics/yolov5/blob/5d4258fac5e6ceaa9c897f841cb737c56717a996/utils/augmentations.py#L125-L211

    height = im.shape[0] + border[0] * 2  # shape(h,w,c)
    width = im.shape[1] + border[1] * 2

    # Translation
    T = np.eye(3)
    T[0, 2] = random.uniform(0.5 - translate, 0.5 + translate) * width  # x translation (pixels)
    T[1, 2] = random.uniform(0.5 - translate, 0.5 + translate) * height  # y translation (pixels)

We have a TODO to update YOLOv3 with the latest YOLOv5 v6.0 release updates, which will fix this error, so no action is required at the moment.

Thank you for spotting this though, and please let us know if you find any other issues!

github-actions[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

ultralytics / xview-yolov3

Possible mistake when calculating translation matrix #30