ultralytics / yolov5

YOLOv5 🚀 in PyTorch > ONNX > CoreML > TFLite
https://docs.ultralytics.com
GNU Affero General Public License v3.0
50.92k stars 16.4k forks source link

Add CutMix Augmentation #448

Closed TaoXieSZ closed 4 years ago

TaoXieSZ commented 4 years ago

🚀 Feature

Add CutMix when using mosaic. Paper: https://openaccess.thecvf.com/content_ICCV_2019/html/Yun_CutMix_Regularization_Strategy_to_Train_Strong_Classifiers_With_Localizable_Features_ICCV_2019_paper.html

My implementation

https://github.com/ChristopherSTAN/yolov5/blob/2c210827a9a98633bc1e821156bc67c00cbc38d1/utils/datasets.py#L449-L469

Samples

image

Future

I will try to add AugMix: https://arxiv.org/abs/1912.02781

glenn-jocher commented 4 years ago

@ChristopherSTAN wow nice! I have one suggestion though. It looks like you are keeping any box with > 0 pixel size after the operation. This seems simple, but in reality what you do here will greatly affect your results. If you end up keeping many boxes with extremely narrow widths or heights, or high aspect ratios, the training may be adversely affected.

random_affine() already deals with this problem in the best way I was able to. For example if an operation crops 90% of a person, the remaining edge area may contain little to no actual pixels of the person. So I tested a few thresholds for rejecting or accepting reduced bounding boxes, and the result is inside random_affine(). It may be worthwhile to put this in it's own function so it could be used in other augmentation operations like your cutmix above.

The greatest cut here is that boxes are removed if they lose > 80% of their original area. Augmented boxes also have width-height constraints of > 2 pixels and an aspect ratio constraint of 20 > ar > 1/20. https://github.com/ultralytics/yolov5/blob/b6fe2e45956c2e09e1ba91afa4d0fde4859deaff/utils/datasets.py#L765-L778

TaoXieSZ commented 4 years ago

@glenn-jocher Thanks for your suggestion! Learn a lot again!

TaoXieSZ commented 4 years ago

@glenn-jocher Applying your suggestion: https://github.com/ChristopherSTAN/yolov5/blob/85f7663c47f18377e0ff61f4ce86cae4f37746b7/utils/datasets.py#L449-L475

glenn-jocher commented 4 years ago

@ChristopherSTAN looks good, but best would be to drop this into it's own function, and then call the function from random_affine() and CutMix(). This would return valid indices, the way the torchvision nms function does.

box_candidates(box0, box1, wh_thr=2, ar_thr=20, area_thr=0.2):  # box0 before augment, box1 after
    ...
    return i  # candidate indices
yangfei963158659 commented 4 years ago

使用cutmix之后,精度能提升多少?

glenn-jocher commented 4 years ago

@ChristopherSTAN I've created the function I mentioned before in https://github.com/ultralytics/yolov5/pull/494. The new function can be used to reject poor candidate boxes now in any augmentation function like cutmix etc in addition to being used in random_affine():

https://github.com/ultralytics/yolov5/blob/4b5f4806bcd513b18171034c06364432ef2c19c2/utils/datasets.py#L776-L790

TaoXieSZ commented 4 years ago

使用cutmix之后,精度能提升多少?

I think it depends on which datasets you use. In the kaggle wheat compete, It did help me improve mAP, by nearly 0.03.

glenn-jocher commented 4 years ago

@ChristopherSTAN I've updated TTA to increase flexibility in https://github.com/ultralytics/yolov5/pull/506. You should be able to simply add extra augmentations by appending extra ops to the scale and flip lists. For wheat I might recommend something like this to bump mAP up slightly compared to the default TTA, but you can add any number of operations here. They run one at a time, so you don't have to worry about CUDA OOM.

            s = [1, 0.9, 0.8, 0.7, 0.6, 0.5]  # scales
            f = [None, 2, 3, None, 2, 3]  # flips (2-ud, 3-lr)

https://github.com/ultralytics/yolov5/blob/1d17b9af0f68ee97f9edc5f10fea51e9af9ef14e/models/yolo.py#L82-L98

TaoXieSZ commented 4 years ago

@glenn-jocher Great Job! I used to apply your TTA with old version: image And I try flip ud.

It seems that you only offer flip TTA. Besides, I wonder if there is other TTA method you think is good for X-ray detection: image

I think we can have rotation or HSV transformation.

glenn-jocher commented 4 years ago

@ChristopherSTAN yes it's possible you might be able to add affine augmentations, but these may also clip boxes, the same as if you use a scale > 1.0 in the TTA code, so its complicated.

TaoXieSZ commented 4 years ago

@glenn-jocher I will try to add rotation first, because the direction of objects doesn't matter.

Then I will try to add some other fake "objects" into the images, like hammers.

glenn-jocher commented 4 years ago

@ChristopherSTAN you have to keep in mind that the TTA first applies augmentation, runs inference, then applies the reverse augmentation operations on the outputs, so any geometric transforms need to be invertible. It's possible but you will need to develop the ops as affine or perspective transforms, and then invert the matrices for the reverse operations. It will be very complicated as you can see in random_affine().

Colorspace augmentation is much easier, no reverse operations required. When combining TTA results, it is preferable to merge outputs rather than appending outputs, but this is more complicated.

Model ensembling merges the outputs of different models togethor, as this produced better results than appending. This merge code is in the Ensemble() module: https://github.com/ultralytics/yolov5/blob/1d17b9af0f68ee97f9edc5f10fea51e9af9ef14e/models/experimental.py#L113-L126

TaoXieSZ commented 4 years ago

@glenn-jocher Yes, I got your point. I have some template like this:

class TTAHorizontalFlip(BaseWheatTTA):
    """ author: @shonenkov """

    def augment(self, image):
        return image.flip(1)

    def batch_augment(self, images):
        return images.flip(2)

    def deaugment_boxes(self, boxes):
        boxes[:, [1,3]] = self.image_size - boxes[:, [3,1]]
        return boxes
glenn-jocher commented 4 years ago

@ChristopherSTAN ah yes, exactly that's a good idea. The more general case would apply an affine matrix to the image, then the inverse to the boxes, though again you run into the clipping issue when you start doing this, so you may need to combine affine ops (rotation, shear) with a reduction in scale. If you have a square image and rotate is 45deg, the edges will be sqrt(2) longer than the base, so you would want to reduce your image contents scale by 1/sqrt(2) to completely eliminate edge clipping (the pixel size would stay the same, the affine scale op would be 1/sqrt(2).

TaoXieSZ commented 4 years ago

Want to make sure, I want to apply rat90 in TTA:

torch.rot90(x, 1, (2, 3)),

the forwardonce() output $(x{center}, y_{center}, w, h)$ And I think the de-augmentation should be: $$ x = y y = width - x $$ Then I implement as (older version, due to the uncompleted experiment):

y[3][..., [2, 3]] = y[3][..., [3, 2]] # exchange w, h
y[3][..., 0], y[3][..., 1] = y[3][..., 1], img_size[1] - y[3][..., 3] # de-rotate

In total: image

glenn-jocher commented 4 years ago

@ChristopherSTAN 90 deg rotations would be good for wheat. I don't know exactly the label op you'd want there. Typically this is handled by a 2D direction cosine matrix as perhaps M = [cos(theta), sin(theta), -sin(theta), -cos(theta)] with your new coordinates being xnew = M @ x.T or perhaps xnew = x @ M.T assuming x is nx2 and M is 2x2.

But you really need to verify your ops by visualizing the image with overlaid boxes in matplotlib. i.e. plot both before the de-augmentation and then after and make sure your points correspond.

This is why I have the cv2 image plotting function commented out, but you need matplotlib to overlay boxes with your images more easily.

TaoXieSZ commented 4 years ago

@glenn-jocher Dear Glenn, I want to ask you about the multi-scale training trick, although I have used it many times. To fully understand it, I want to plugin into efficientDet. I noticed in train.py: https://github.com/ultralytics/yolov5/blob/7f8471eaebe4b192c5e6ab4e5c821d91e43cb4fe/train.py#L279-L285

Is it every thing we need to apply multi-scale training? Thanks

glenn-jocher commented 4 years ago

@ChristopherSTAN sure, as long as your labels are normalized.

clw5180 commented 4 years ago

使用cutmix之后,精度能提升多少?

I think it depends on which datasets you use. In the kaggle wheat compete, It did help me improve mAP, by nearly 0.03.

You mean 0.70 to 0.73, or 70 to 70.03 ? And how much improvement you get when using mixup at wheat dataset

TaoXieSZ commented 4 years ago

@clw5180 similar to 0.70 ➡️ 0.703

glenn-jocher commented 4 years ago

@ChristopherSTAN @clw5180 I tried a cutmix-like experiment on coco, copy the smallest 25% of objects in each mosaic and randomly placing them around the mosaic a second time (no transparency, just replace the local pixels with the new ones), and observed worse results in COCO. I only used small objects to try to avoid having to check if the existing labels were obscured.

TaoXieSZ commented 4 years ago

没有意义啊,wheat比赛不能用yolov5

发自我的iPhone

------------------ Original ------------------ From: Henry <notifications@github.com> Date: Sun,Aug 2,2020 0:12 PM To: ultralytics/yolov5 <yolov5@noreply.github.com> Cc: ChristopherSTAN <497592613@qq.com>, Mention <mention@noreply.github.com> Subject: Re: [ultralytics/yolov5] Add CutMix Augmentation (#448)

@clw5180 similar to 0.70 ➡️ 0.703

老哥要不加个微信交流下,这个项目我现在单模没有伪标签可以接近0.77,但是加了伪标签后几乎没多少提升......

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

ZeKunZhang1998 commented 4 years ago

@ChristopherSTAN I've updated TTA to increase flexibility in #506. You should be able to simply add extra augmentations by appending extra ops to the scale and flip lists. For wheat I might recommend something like this to bump mAP up slightly compared to the default TTA, but you can add any number of operations here. They run one at a time, so you don't have to worry about CUDA OOM.

            s = [1, 0.9, 0.8, 0.7, 0.6, 0.5]  # scales
            f = [None, 2, 3, None, 2, 3]  # flips (2-ud, 3-lr)

https://github.com/ultralytics/yolov5/blob/1d17b9af0f68ee97f9edc5f10fea51e9af9ef14e/models/yolo.py#L82-L98

Hi,why you choose 0.6 and 0.5? Why are they better then 1.0?

sky-fly97 commented 4 years ago

没有意义啊,wheat比赛不能用yolov5 发自我的iPhone ------------------ Original ------------------ From: Henry <notifications@github.com> Date: Sun,Aug 2,2020 0:12 PM To: ultralytics/yolov5 <yolov5@noreply.github.com> Cc: ChristopherSTAN <497592613@qq.com>, Mention <mention@noreply.github.com> Subject: Re: [ultralytics/yolov5] Add CutMix Augmentation (#448) @clw5180 similar to 0.70 ➡️ 0.703 老哥要不加个微信交流下,这个项目我现在单模没有伪标签可以接近0.77,但是加了伪标签后几乎没多少提升...... — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

Hello, it seems that your cut mix directly selects a random area and then performs weighted mix up instead of replacing pixels

zhang295498 commented 4 years ago

没有意义啊,wheat比赛不能用yolov5 发自我的iPhone ------------------ Original ------------------ From: Henry <notifications@github.com> Date: Sun,Aug 2,2020 0:12 PM To: ultralytics/yolov5 <yolov5@noreply.github.com> Cc: ChristopherSTAN <497592613@qq.com>, Mention <mention@noreply.github.com> Subject: Re: [ultralytics/yolov5] Add CutMix Augmentation (#448) @clw5180 similar to 0.70 0.703 老哥要不加个微信交流下,这个项目我现在单模没有伪标签可以接近0.77,但是加了伪标签后几乎没多少提升...... — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

您好,我也再用yolov5,能加个好友交流一下么?

github-actions[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

nanhui69 commented 3 years ago

@glenn-jocher i saw you add mix up strategy instead of Cutmmix in dataset.py,i would like to know do you take this part's loss (mix_up) into account ? if there is ,where i can find?

glenn-jocher commented 3 years ago

@nanhui69 mixup is added here after mosaic is loaded: https://github.com/ultralytics/yolov5/blob/68e6ab668b30a6014215b94e399151f8c76e471a/utils/datasets.py#L497-L507

The labels are updated, but the loss function is not modified.

nanhui69 commented 3 years ago

@nanhui69 mixup is added here after mosaic is loaded: https://github.com/ultralytics/yolov5/blob/68e6ab668b30a6014215b94e399151f8c76e471a/utils/datasets.py#L497-L507

The labels are updated, but the loss function is not modified.

ok, got it..

nanhui69 commented 3 years ago

@glenn-jocher my train's result is show follow image which is trained by v5 repo lately,clould give som advices to achieve better performance on my custom data? results

glenn-jocher commented 3 years ago

@nanhui69 suggestions are always the same, and I've repeated them many times here:

awsaf49 commented 2 years ago

@glenn-jocher I couldn't find any CutMix in the repo, was it replaced with Mosaic?

glenn-jocher commented 2 years ago

@awsaf49 yes that's true it's not available per se, but we do have mixup, cutout and copy-paste (if you have segmentation labels), and of course mosaic, so I think between those it's likely we've covered most of what cutmix does.