ultralytics / yolov5

YOLOv5 🚀 in PyTorch > ONNX > CoreML > TFLite
https://docs.ultralytics.com
GNU Affero General Public License v3.0
50.58k stars 16.31k forks source link

Incorrect image shape order in utils/datasets.letterbox #266

Closed wanghaoyang0106 closed 4 years ago

wanghaoyang0106 commented 4 years ago

🐛 Bug

In line 680-683 of utils/datasets.py, function letterbox, new_shape is regarded as with an order [width, height].

elif scaleFill:  # stretch
        dw, dh = 0.0, 0.0
        new_unpad = new_shape
        ratio = new_shape[0] / shape[1], new_shape[1] / shape[0]  # width, height ratios

However in other context of this function, it's with an order [height, width], e.g.:

r = min(new_shape[0] / shape[0], new_shape[1] / shape[1])
dw, dh = new_shape[1] - new_unpad[0], new_shape[0] - new_unpad[1]  # wh padding

scaleFill may hardly be set and images are usually in square shapes, but this might be a potential risk.

github-actions[bot] commented 4 years ago

Hello @wanghaoyang0106, thank you for your interest in our work! Please visit our Custom Training Tutorial to get started, and see our Jupyter Notebook Open In Colab, Docker Image, and Google Cloud Quickstart Guide for example environments.

If this is a bug report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you.

If this is a custom model or data training question, please note that Ultralytics does not provide free personal support. As a leader in vision ML and AI, we do offer professional consulting, from simple expert advice up to delivery of fully customized, end-to-end production solutions for our clients, such as:

For more information please visit https://www.ultralytics.com.

glenn-jocher commented 4 years ago

@wanghaoyang0106 oh thanks for discovering this! Can you submit a PR with your proposed fix please? Thank you!

wanghaoyang0106 commented 4 years ago

Hello! I made the PR of bug #266 I also noticed that, in utils/utils.scale_coords, the gain calculation is quite confusing.

gain = max(img1_shape) / max(img0_shape)  # gain  = old / new

It might be incorrect if the image to the model img1 is in rectangular shape but the original image img0 is in squre, or img1 is larger in width but img0 is larger in height. Actually gain and pad here should be exactly idendity to the r and (dw, dh) in utils/datasets.letterbox, since it's just the reverse operation. So I also modified this part.

But still, there are several round operations in utils/datasets.letterbox, but the reverse operation of utils/utils.scale_coords uses exact float number, which may cause some slight error. So if possible, I hope to change to use ratio_pad in utils/utils.scale_coords to directly set the ratio and pad from utils/datasets.letterbox.

And actually in my own application, I will use some images with very large width/height ratio, and that's why I am focusing on this width/height process. But currently train.py only do preprocess to resize and pad the raw image to square. This is acceptable but not preferrable. If possible, I hope to have the feature of allowing rectangular shaped images for training.

glenn-jocher commented 4 years ago

@wanghaoyang0106 was just reviewing this issue and read your message again. Rectangular training is designed into the system, your training command would be this below, but in this case we do not use the mosaic dataloader, we only train with 1 image at a time.

python trian.py --rect

Also, rectangular inference is the default setting with test.py and detect.py, and also in our iOS app iDetection. For obvious reasons its very advantageous at inference time.

wanghaoyang0106 commented 4 years ago

Thank you! Got it. I will read the source code for the usage.

glenn-jocher commented 4 years ago

@supergbl --rect (and all other train.py arguments) are compatible with any batch size.

glenn-jocher commented 4 years ago

Removing TODO as issue appears resolved.