ultralytics / ultralytics

NEW - YOLOv8 🚀 in PyTorch > ONNX > OpenVINO > CoreML > TFLite
https://docs.ultralytics.com
GNU Affero General Public License v3.0
29.09k stars 5.75k forks source link

How to do custom data augmentation e.g. gray picture? #4792

Closed tms2003 closed 10 months ago

tms2003 commented 1 year ago

Search before asking

Question

How to do custom data augmentation? For example: I want to add a certain proportion of grayscale enhancement to the data, how should I do it? I checked the documentation: https://docs.ultralytics.com/usage/cfg/#augmentation which has hsv enhancements, but no grayscale processing, how can I achieve a configuration similar to yolov5 to cover (or increase) Some enhanced parameters (such as blur and gray)

I use yolo cli for training....

Additional

No response

glenn-jocher commented 1 year ago

@tms2003 hello,

Thank you for your question about custom data augmentation in YOLOv8. Currently, built-in grayscale augmentation is not directly supported. However, Ultralytics has designed YOLOv8 to be highly flexible and modular, so you can implement custom data augmentations quite easily.

You can implement grayscale augmentation in the datasets.py file. There, you will find the load_mosaic() and load_image() functions, among others. These functions handle image loading and augmentation. You can add a grayscale enhancement augmentation to these functions.

Additionally, you may want to modify the hsv_adjust() function to accommodate the grayscale augmentation. This function currently supports HSV augmentation, but you could add a similar mechanism for grayscale.

Remember that providing a grayscale image to the YOLOv8 network, which expects three color channels, might affect performance. Hence, after the grayscale transformation, you might want to stack the image data three times along the color channel axis to conform to this expectation.

Finally, after implementing these changes, you might want to test your updated augmentation process on a few images to ensure it works correctly. You can do this by using a simple test script to load a batch of images using your dataset, and then visualizing the images (and possibly their labels) to confirm successful grayscale augmentation.

I hope this answer helps you! Please don't hesitate to reach out if you have further questions.

Best wishes, Glenn

frabob2017 commented 1 year ago

I also have a gray image with 3 channels. I first collapse them in one channel using torch.mean, Then I stack the continuous 6 slices along the channel direction. The training is runnable and no error is reported. But I have not checked the detection efficacy yet.

glenn-jocher commented 1 year ago

@frabob2017 hello,

Your approach to handling grayscale images in a YOLOv8 architecture seems perfectly viable - averaging the pixel values across channels with torch.mean to create a single-channel grayscale image, then stacking multiple such images along the channel axis. This way, the model still receives a 3D tensor input as it expects.

This strategy preserves the spatial information necessary for detection tasks while using grayscale images instead of RGB. However, it is crucial to note that since YOLOv8 was originally trained on three-channel color images, the model might perform differently on grayscale images.

The best way to ascertain the efficacy of your approach would be to assess the detection capability of the model empirically. You can validate your model on a held-out portion of your data and benchmark its performance against established metrics like mAP, Precision, or Recall.

I'm glad to hear your training run is executing successfully without any errors. Good luck with your experiments, and please let the community know about your findings. We're always eager to learn about new approaches and their outcomes.

Best, Glenn

frabob2017 commented 1 year ago

Thank you so much. Jocher. By reading your YOLOv8 code, I learned more about the YOLO algorithm. I am labeling more images and hope to see if this revision of your code will work better.

glenn-jocher commented 1 year ago

@frabob2017 hello,

I appreciate your kind words and am pleased to hear you have learned more about the YOLO algorithm from the YOLOv8 coding implementation. It's always encouraging to know that the community is benefiting from this effort. All credit goes to the broader YOLO community and the Ultralytics team.

Your approach to investing effort in labelling more images is very sound. Having a well-labelled and diversified dataset is a critical component in achieving the best detection performance with YOLOv8.

If you encounter any issues or have additional questions while you are working on your project, please feel free to bring it up in the repository. We're eager to support you and the broader YOLOv8 user community with any concerns you may have.

Looking forward to hearing about your progress and success with YOLOv8!

Best, Glenn

tms2003 commented 1 year ago

@glenn-jocher

@tms2003 hello,

Thank you for your question about custom data augmentation in YOLOv8. Currently, built-in grayscale augmentation is not directly supported. However, Ultralytics has designed YOLOv8 to be highly flexible and modular, so you can implement custom data augmentations quite easily.

You can implement grayscale augmentation in the datasets.py file. There, you will find the load_mosaic() and load_image() functions, among others. These functions handle image loading and augmentation. You can add a grayscale enhancement augmentation to these functions.

Additionally, you may want to modify the hsv_adjust() function to accommodate the grayscale augmentation. This function currently supports HSV augmentation, but you could add a similar mechanism for grayscale.

Remember that providing a grayscale image to the YOLOv8 network, which expects three color channels, might affect performance. Hence, after the grayscale transformation, you might want to stack the image data three times along the color channel axis to conform to this expectation.

Finally, after implementing these changes, you might want to test your updated augmentation process on a few images to ensure it works correctly. You can do this by using a simple test script to load a batch of images using your dataset, and then visualizing the images (and possibly their labels) to confirm successful grayscale augmentation.

I hope this answer helps you! Please don't hesitate to reach out if you have further questions.

Best wishes, Glenn

I noticed the following code.

 def build_transforms(self, hyp=None):
        """Builds and appends transforms to the list."""
        if self.augment:
            hyp.mosaic = hyp.mosaic if self.augment and not self.rect else 0.0
            hyp.mixup = hyp.mixup if self.augment and not self.rect else 0.0
            transforms = v8_transforms(self, self.imgsz, hyp)

def v8_transforms(dataset, imgsz, hyp, stretch=False):
    """Convert images to a size suitable for YOLOv8 training."""
    pre_transform = Compose([
        Mosaic(dataset, imgsz=imgsz, p=hyp.mosaic),
        CopyPaste(p=hyp.copy_paste),
        RandomPerspective(
            degrees=hyp.degrees,
            translate=hyp.translate,
            scale=hyp.scale,
            shear=hyp.shear,
            perspective=hyp.perspective,
            pre_transform=None if stretch else LetterBox(new_shape=(imgsz, imgsz)),
        )])
    flip_idx = dataset.data.get('flip_idx', [])  # for keypoints augmentation
    if dataset.use_keypoints:
        kpt_shape = dataset.data.get('kpt_shape', None)
        if len(flip_idx) == 0 and hyp.fliplr > 0.0:
            hyp.fliplr = 0.0
            LOGGER.warning("WARNING ⚠️ No 'flip_idx' array defined in data.yaml, setting augmentation 'fliplr=0.0'")
        elif flip_idx and (len(flip_idx) != kpt_shape[0]):
            raise ValueError(f'data.yaml flip_idx={flip_idx} length must be equal to kpt_shape[0]={kpt_shape[0]}')

    return Compose([
        pre_transform,
        MixUp(dataset, pre_transform=pre_transform, p=hyp.mixup),
        Albumentations(p=1.0),
        RandomHSV(hgain=hyp.hsv_h, sgain=hyp.hsv_s, vgain=hyp.hsv_v),
        RandomFlip(direction='vertical', p=hyp.flipud),
        RandomFlip(direction='horizontal', p=hyp.fliplr, flip_idx=flip_idx)])  # transforms

and....
class Albumentations:
    """Albumentations transformations. Optional, uninstall package to disable.
    Applies Blur, Median Blur, convert to grayscale, Contrast Limited Adaptive Histogram Equalization,
    random change of brightness and contrast, RandomGamma and lowering of image quality by compression."""

    def __init__(self, p=1.0):
        """Initialize the transform object for YOLO bbox formatted params."""
        self.p = p
        self.transform = None
        prefix = colorstr('albumentations: ')
        try:
            import albumentations as A

            check_version(A.__version__, '1.0.3', hard=True)  # version requirement

            T = [
                A.Blur(p=0.01),
                A.MedianBlur(p=0.01),
                A.ToGray(p=0.01),
                A.CLAHE(p=0.01),
                A.RandomBrightnessContrast(p=0.0),
                A.RandomGamma(p=0.0),
                A.ImageCompression(quality_lower=75, p=0.0)]  # transforms
            self.transform = A.Compose(T, bbox_params=A.BboxParams(format='yolo', label_fields=['class_labels']))

Therefore, as long as you use augment: True, you will automatically have grayscale enhancement and many other enhancements.

glenn-jocher commented 1 year ago

@tms2003 indeed, you are correct in your observation that the YOLOv8 code, when using augment: True, applies multiple image enhancements, such as Blur, Median Blur, ToGray, CLAHE, RandomBrightnessContrast, RandomGamma, and ImageCompression, thanks to the Albumentations library.

However, it's important to note that although the ToGray method is part of the Albumentations pipeline, this doesn't actually convert the images into grayscale for the training process. The p=0.01 argument indicates that only 1% of the images in your dataset will be converted to grayscale for augmentation purposes.

The grayscale augmentation I mentioned earlier in the thread is a proposed custom implementation that will convert all images to grayscale before feeding them into the model. Modifying specific functions in the datasets.py file such as load_mosaic() and load_image() will enable this.

I hope this provides some clarity. Let me know if you have further questions.

Best, Glenn

github-actions[bot] commented 11 months ago

👋 Hello there! We wanted to give you a friendly reminder that this issue has not had any recent activity and may be closed soon, but don't worry - you can always reopen it if needed. If you still have any questions or concerns, please feel free to let us know how we can help.

For additional resources and information, please see the links below:

Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!

Thank you for your contributions to YOLO 🚀 and Vision AI ⭐

BillJohns commented 8 months ago

Concerning this statement: "datasets.py file. There, you will find the load_mosaic() and load_image()"
Looking in lib I only found a "dataset.py" (no s at the end). In this file there is no load_mosaic nor a load_image function. Am I looking in the wrong place?

glenn-jocher commented 8 months ago

@BillJohns apologies for the confusion. You are correct; the file is named dataset.py without an 's' at the end. The functions for loading and augmenting images may have different names or may be structured differently than described. Please refer to the dataset.py file and look for the relevant image loading and augmentation methods to implement custom augmentations. If you have any further questions, feel free to ask.