ultralytics / yolov5

YOLOv5 πŸš€ in PyTorch > ONNX > CoreML > TFLite
https://docs.ultralytics.com
GNU Affero General Public License v3.0
50.56k stars 16.3k forks source link

Non-letterbox resizing degrades performance #209

Closed pfeatherstone closed 4 years ago

pfeatherstone commented 4 years ago

I've realised playing with these models that performance is degraded when using regular resizing vs letterbox resizing. I would suggest not training with letterbox resizing, and instead add some augmentation whereby images are stretched, rotated, cropped, etc during training.

github-actions[bot] commented 4 years ago

Hello @pfeatherstone, thank you for your interest in our work! Please visit our Custom Training Tutorial to get started, and see our Jupyter Notebook Open In Colab, Docker Image, and Google Cloud Quickstart Guide for example environments.

If this is a bug report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you.

If this is a custom model or data training question, please note that Ultralytics does not provide free personal support. As a leader in vision ML and AI, we do offer professional consulting, from simple expert advice up to delivery of fully customized, end-to-end production solutions for our clients, such as:

For more information please visit https://www.ultralytics.com.

glenn-jocher commented 4 years ago

@pfeatherstone ah interesting observation. Can you upload a few examples (i.e. train_batch0.jpg) of the augmentation and resizing techniques you found to work best?

We are careful to maintain aspect ratio when resizing, and to ensure the proper resizing algorithms to avoid aliasing etc.

We were unfortunately not able to gain mAP using rotation and shearing operations in our COCO experiments, but did find translation and scaling to help.

pfeatherstone commented 4 years ago

Here is yolov5s inference using letterbox resizing and input dimension 640x640 (the default)

dog_detections_letterbox

Here is the yolov5s inference using normal resizing and input dimension 640x*640

dog_detections

pfeatherstone commented 4 years ago

So you can see that without letterbox resizing, accuracy goes down and the boxes are as tight.

pfeatherstone commented 4 years ago

Is there any reason why you might want to preserve aspect ratio? I haven't trained yolov5 models yet, but when i used your yolov3-spp model i added a bunch of albumentations augmentation transformations (like 15+ possible transformations) to make it as resilient as possible, which worked pretty well.

pfeatherstone commented 4 years ago

Here is the albumentations composition i used:

import  albumentations as albu

albu.Compose([albu.OneOf([albu.RandomBrightnessContrast(brightness_limit=0.3, contrast_limit=0.3, p=0.7),
                                             albu.RandomGamma(gamma_limit=(50, 150), p=0.7),
                                             albu.RGBShift(p=0.7),
                                             albu.HueSaturationValue(hue_shift_limit=10, sat_shift_limit=15, val_shift_limit=10, p=0.7),
                                             albu.CLAHE(p=0.7),
                                             albu.ImageCompression(quality_lower=30),
                                             albu.GaussNoise(p=0.7),
                                             albu.GaussianBlur(p=0.7),
                                             albu.MedianBlur(p=0.7),
                                             albu.ChannelShuffle(p=0.7),
                                             albu.CoarseDropout(p=0.7),
                                             albu.Equalize(p=0.7),
                                             albu.FancyPCA(p=0.7),
                                             albu.IAAEmboss(p=0.7),
                                             albu.IAASharpen(p=0.7),
                                             albu.ISONoise(p=0.7),
                                             albu.Posterize(p=0.7),
                                             albu.InvertImg(),
                                             albu.MotionBlur(always_apply=True),
                                             albu.RandomRain(),
                                             albu.RandomShadow(),
                                             albu.RandomSnow(),
                                             albu.Solarize()]),
                                 albu.OneOf([albu.VerticalFlip(p=0.2),
                                             albu.HorizontalFlip(),
                                             albu.Transpose(p=0.2),
                                             albu.ShiftScaleRotate()])],
                                bbox_params=albu.BboxParams(format='coco', label_fields=['category_id']))

I thought albu.ShiftScaleRotate scaled differently in both spatial dimensions, i don't think it does actually. So there wasn't an aggressive enough diversity in aspect ratios. Oh well.

glenn-jocher commented 4 years ago

@pfeatherstone that's definitely a substantial amount of augmentation. You should be careful though, some methods like CLAHE (contrast limited adaptive histogram equalization) are used to enhance image contrast, and since this will not be applied during testing, introducing this during training may harm your test results.

If you do find a combination of augmentation parameters that outperform the default on COCO training please let us know though, this would be very useful to update our defaults with.

pfeatherstone commented 4 years ago

The augmentation was helpful on my custom dataset where there wasn’t a huge amount of diversity. On COCO, I doubt that much augmentation is necessary. However, the degraded performance of yolov5 when not using letterbox resizing suggests that some scaling augmentation on top of mosaic would be beneficial.

pfeatherstone commented 4 years ago

Unfortunately I don’t have a lot of time to train coco models at the moment. But In case you were looking for some ideas, scaling augmentation might be a good one.

github-actions[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

pfeatherstone commented 4 years ago

@glenn-jocher Just noticed that this was closed. Don't you think that non-letterbox resizing issues impacting accuracy should be looked at? This isn't a problem with the original yolov3 and yolov3-spp models. Stretching could maybe be a form of augmentation to avoid overfitting to specific aspect ratios...

glenn-jocher commented 4 years ago

@pfeatherstone stretching produced worse results in our experiments, which is why we do not use it.

hiyyg commented 3 years ago

Hi @glenn-jocher , I wonder why do you use letterbox padding instead of just padding to the bottom right?

glenn-jocher commented 3 years ago

@hiyyg symmetric padding allows for reduced edge effects vs unilateral padding.

glenn-jocher commented 3 years ago

@pfeatherstone see PR #3882 for a proposed automatic Albumentations integration.

glenn-jocher commented 3 years ago

@hiyyg @pfeatherstone good news πŸ˜ƒ! Your original issue may now be fixed βœ… in PR #3882. This PR implements a YOLOv5 πŸš€ + Albumentations integration. The integration will automatically apply Albumentations transforms during YOLOv5 training if albumentations>=1.0.0 is installed in your environment.

Get Started

To use albumentations simply pip install -U albumentations and then update the augmentation pipeline as you see fit in the Albumentations class in yolov5/utils/augmentations.py. Note these Albumentations operations run in addition to the YOLOv5 hyperparameter augmentations, i.e. defined in hyp.scratch.yaml.

class Albumentations:
    # YOLOv5 Albumentations class (optional, used if package is installed)
    def __init__(self):
        self.transform = None
        try:
            import albumentations as A
            check_version(A.__version__, '1.0.0')  # version requirement

            self.transform = A.Compose([
                A.Blur(p=0.1),
                A.MedianBlur(p=0.1),
                A.ToGray(p=0.01)],
                bbox_params=A.BboxParams(format='yolo', label_fields=['class_labels']))

            logging.info(colorstr('albumentations: ') + ', '.join(f'{x}' for x in self.transform.transforms))
        except ImportError:  # package not installed, skip
            pass
        except Exception as e:
            logging.info(colorstr('albumentations: ') + f'{e}')

    def __call__(self, im, labels, p=1.0):
        if self.transform and random.random() < p:
            new = self.transform(image=im, bboxes=labels[:, 1:], class_labels=labels[:, 0])  # transformed
            im, labels = new['image'], np.array([[c, *b] for c, b in zip(new['class_labels'], new['bboxes'])])
        return im, labels

Example Result

Example train_batch0.jpg on COCO128 dataset with Blur, MedianBlur and ToGray. See the YOLOv5 Notebooks to reproduce: Open In Colab Open In Kaggle

train_batch0

Update

To receive this YOLOv5 update:

Thank you for spotting this issue and informing us of the problem. Please let us know if this update resolves the issue for you, and feel free to inform us of any other issues you discover or feature requests that come to mind. Happy trainings with YOLOv5 πŸš€!