YOLOv5 Segmentation Overlapping Objects Annotation

rukshankr commented 1 year ago

Search before asking

[X] I have searched the YOLOv5 issues and discussions and found no similar questions.

Question

I am training YOLOv5 for a food items dataset but most food items are largely overlapping each other in the images.

Therefore I annotated the dataset with the occluded food images annotated to their original shape. Like this:

Is this approach correct? I want to get segmentation masks that are as close to the original shape as possible. What else can I do to improve the accuracy of the predictions?

Additional

No response

github-actions[bot] commented 1 year ago

👋 Hello @rukshankr, thank you for your interest in YOLOv5 🚀! Please visit our ⭐️ Tutorials to get started, where you can find quickstart guides for simple tasks like Custom Data Training all the way to advanced concepts like Hyperparameter Evolution.

If this is a 🐛 Bug Report, please provide a minimum reproducible example to help us debug it.

If this is a custom training ❓ Question, please provide as much information as possible, including dataset image examples and training logs, and verify you are following our Tips for Best Training Results.

Requirements

Python>=3.7.0 with all requirements.txt installed including PyTorch>=1.7. To get started:

git clone https://github.com/ultralytics/yolov5  # clone
cd yolov5
pip install -r requirements.txt  # install

Environments

YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Notebooks with free GPU:
Google Cloud Deep Learning VM. See GCP Quickstart Guide
Amazon Deep Learning AMI. See AWS Quickstart Guide
Docker Image. See Docker Quickstart Guide

Status

If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv5 training, validation, inference, export and benchmarks on MacOS, Windows, and Ubuntu every 24 hours and on every commit.

Introducing YOLOv8 🚀

We're excited to announce the launch of our latest state-of-the-art (SOTA) object detection model for 2023 - YOLOv8 🚀!

Designed to be fast, accurate, and easy to use, YOLOv8 is an ideal choice for a wide range of object detection, image segmentation and image classification tasks. With YOLOv8, you'll be able to quickly and accurately detect objects in real-time, streamline your workflows, and achieve new levels of accuracy in your projects.

Check out our YOLOv8 Docs for details and get started with:

pip install ultralytics

glenn-jocher commented 1 year ago

@rukshankr, if you want to get segmentation masks that are as close to the original shape as possible, annotating occluded food images to their original shape is a valid approach. However, if the items are overlapping too much, you may want to consider separating these items into their own bounding boxes for better accuracy.

You may also want to try experimenting with different augmentation strategies during training, such as mixup or mosaic augmentation, to improve the model's ability to detect objects in overlapping situations. Additionally, it may be helpful to try training on a smaller subset of the dataset first to find the best model architecture and hyperparameters before training on the entire dataset.

rukshankr commented 1 year ago

@glenn-jocher thank you so much for the advice. I will try the methods that you have mentioned.

glenn-jocher commented 1 year ago

You're welcome, @rukshankr! Please let us know if you have any further questions or run into any issues during your training process. We wish you good luck with your project!

vtyw commented 1 year ago

@glenn-jocher I'm doing something very similar to this and running into problems. What we're doing is equivalent to labeling donuts like rukshankr's example, but adding an additional object which is the hole of the donut. The detected donut masks exclude the donut hole even though they are annotated to include everything. Similar issue with overlap, the masks produced generally do not overlap with other detected masks of different classes even though the training data has them overlapping.

Seems that some part of yolov5 mask detection prevents masks from overlapping. I suspect this is occurring both before training (because the raw detections also give me donut objects with holes so they must have been trained on donuts with holes), and also additionally in post-processing of detections, perhaps due to NMS.

vtyw commented 1 year ago

This is the same issue as #10433 about whether yolov5 can include the same pixel as part of more than one object when that's how it's annotated in the training data. I don't believe there is such a restriction in the network itself, it definitely can detect masks that overlap with other objects.

glenn-jocher commented 1 year ago

@vtyw hello! Thank you for bringing up your concerns about YOLOv5. You are correct that YOLOv5 is fundamentally capable of detecting masks that overlap with other objects. However, it is possible that the post-processing steps in the detection pipeline, such as NMS, may be causing the issue you are seeing.

You mentioned that objects with holes are being detected but the holes are being excluded from the mask. This could be due to the network's ability to detect the hole, but post-processing such as NMS removing some of the masks with the overlapping object. One possible solution you could try is to adjust the NMS threshold in the detection pipeline to allow for more overlap between masks.

As you mentioned, there is an existing Issue #10433 in the YOLOv5 repo regarding this topic so I would suggest following any further discussion or developments in that thread as well.

Please let us know if you have any further questions or if there's anything more we can do to assist you!

vtyw commented 1 year ago

@glenn-jocher NMS operates per class so cannot be the cause of this issue.

I did some more exploring and here's what I found:

The default settings for train.py cause the dataloader to combine all object masks to a single mask, meaning each pixel can only assigned to a single object instance. Training thus prevents the network from learning overlapping parts of objects, and some object classes will take precedence over others. To train with overlapping objects, the --no-overlap flag is required. The code comment on this implies that it's slower with this option but it achieves higher mAP.
train.py generally reports mask mAP incorrectly. This is caused by the --mask-ratio option defaulting to 4, and this option being inadvertently forwarded to the validation that occurs during training. The --no-overlap option also causes further incorrect mask mAP calculation for the same reason.
Surprisingly, setting --mask-ratio to 1 (which disables mask downsampling) can actually decrease final mask metrics by a noticeable amount (e.g., more than 5%). Note: using mask downsampling serves to reduce memory usage and offers an appreciable speedup in training time.
What the above can mean in practice: It's best to use --no-overlap for a dataset with overlapping objects, but note that the mAP reported during training might be inflated by a few percent. One must run val.py separately to get the true performance. Theoretically, the wrong validation metric being used during training will result in a suboptimal set of weights being saved as "best", but it also might still be in the ballpark and not matter a lot.

glenn-jocher commented 1 year ago

Hello @vtyw, thank you for your further exploration and sharing your findings about the YOLOv5 detection pipeline. We appreciate you taking the time to investigate and report your results in detail.

Based on your findings, it's clear that there are a few things to be aware of when training and evaluating a YOLOv5 model for datasets with overlapping object instances. Specifically, it's necessary to use the --no-overlap flag when training to allow the network to learn overlapping parts of objects. However, doing so may cause the mask metrics reported during training to be inflated by a few percent, which can be corrected by running val.py separately to obtain the true performance. It's also interesting to note that setting --mask-ratio to 1 can actually decrease the final mask metrics in certain situations.

We appreciate you sharing your insights with the community and we hope that your findings will help others to achieve better results when using YOLOv5. If you have any further questions or if there's anything else we can do to assist you, please don't hesitate to let us know!

ryouchinsa commented 11 months ago

Using the script general_json2yolo.py, you can convert the RLE mask with holes to the YOLO segmentation format.

The RLE mask is converted to a parent polygon and a child polygon using cv2.findContours(). The parent polygon points are sorted in clockwise order. The child polygon points are sorted in counterclockwise order. Detect the nearest point in the parent polygon and in the child polygon. Connect those 2 points with narrow 2 lines. So that the polygon with a hole is saved in the YOLO segmentation format.

def is_clockwise(contour):
    value = 0
    num = len(contour)
    for i, point in enumerate(contour):
        p1 = contour[i]
        if i < num - 1:
            p2 = contour[i + 1]
        else:
            p2 = contour[0]
        value += (p2[0][0] - p1[0][0]) * (p2[0][1] + p1[0][1]);
    return value < 0

def get_merge_point_idx(contour1, contour2):
    idx1 = 0
    idx2 = 0
    distance_min = -1
    for i, p1 in enumerate(contour1):
        for j, p2 in enumerate(contour2):
            distance = pow(p2[0][0] - p1[0][0], 2) + pow(p2[0][1] - p1[0][1], 2);
            if distance_min < 0:
                distance_min = distance
                idx1 = i
                idx2 = j
            elif distance < distance_min:
                distance_min = distance
                idx1 = i
                idx2 = j
    return idx1, idx2

def merge_contours(contour1, contour2, idx1, idx2):
    contour = []
    for i in list(range(0, idx1 + 1)):
        contour.append(contour1[i])
    for i in list(range(idx2, len(contour2))):
        contour.append(contour2[i])
    for i in list(range(0, idx2 + 1)):
        contour.append(contour2[i])
    for i in list(range(idx1, len(contour1))):
        contour.append(contour1[i])
    contour = np.array(contour)
    return contour

def merge_with_parent(contour_parent, contour):
    if not is_clockwise(contour_parent):
        contour_parent = contour_parent[::-1]
    if is_clockwise(contour):
        contour = contour[::-1]
    idx1, idx2 = get_merge_point_idx(contour_parent, contour)
    return merge_contours(contour_parent, contour, idx1, idx2)

def mask2polygon(image):
    contours, hierarchies = cv2.findContours(image, cv2.RETR_CCOMP, cv2.CHAIN_APPROX_TC89_KCOS)
    contours_approx = []
    polygons = []
    for contour in contours:
        epsilon = 0.001 * cv2.arcLength(contour, True)
        contour_approx = cv2.approxPolyDP(contour, epsilon, True)
        contours_approx.append(contour_approx)

    contours_parent = []
    for i, contour in enumerate(contours_approx):
        parent_idx = hierarchies[0][i][3]
        if parent_idx < 0 and len(contour) >= 3:
            contours_parent.append(contour)
        else:
            contours_parent.append([])

    for i, contour in enumerate(contours_approx):
        parent_idx = hierarchies[0][i][3]
        if parent_idx >= 0 and len(contour) >= 3:
            contour_parent = contours_parent[parent_idx]
            if len(contour_parent) == 0:
                continue
            contours_parent[parent_idx] = merge_with_parent(contour_parent, contour)

    contours_parent_tmp = []
    for contour in contours_parent:
        if len(contour) == 0:
            continue
        contours_parent_tmp.append(contour)

    polygons = []
    for contour in contours_parent_tmp:
        polygon = contour.flatten().tolist()
        polygons.append(polygon)
    return polygons 

def rle2polygon(segmentation):
    if isinstance(segmentation["counts"], list):
        segmentation = mask.frPyObjects(segmentation, *segmentation["size"])
    m = mask.decode(segmentation) 
    m[m > 0] = 255
    polygons = mask2polygon(m)
    return polygons

The RLE mask.

スクリーンショット 2023-11-22 1 57 52

The converted YOLO segmentation format.

スクリーンショット 2023-11-22 2 11 14

To run the script, put the COCO JSON file coco_train.json into datasets/coco/annotations. Run the script. python general_json2yolo.py The converted YOLO txt files are saved in new_dir/labels/coco_train.

スクリーンショット 2023-11-23 16 39 21

Edit use_segments and use_keypoints in the script.

if __name__ == '__main__':
    source = 'COCO'

    if source == 'COCO':
        convert_coco_json('../datasets/coco/annotations',  # directory with *.json
                          use_segments=True,
                          use_keypoints=False,
                          cls91to80=False)

To convert the COCO bbox format to YOLO bbox format.

use_segments=False,
use_keypoints=False,

To convert the COCO segmentation format to YOLO segmentation format.

use_segments=True,
use_keypoints=False,

To convert the COCO keypoints format to YOLO keypoints format.

use_segments=False,
use_keypoints=True,

This script originates from Ultralytics JSON2YOLO repository. We hope this script would help your business.

glenn-jocher commented 11 months ago

@ryouchinsa thank you for sharing the detailed script and instructions for converting RLE masks to YOLO segmentation format. The method you've outlined for converting the masks using general_json2yolo.py seems comprehensive and should be helpful for those working with YOLO segmentation format in their projects.

It's great to see the innovation in the community in finding solutions to specific use cases. It's also beneficial moving forward to be able to convert data into the required format for training on YOLOv5.

It's important to note that while the provided script has originated from the Ultralytics JSON2YOLO repository, the specific implementation for converting RLE masks to YOLO segmentation format as described, seems to be a custom addition.

Your contribution to the community is appreciated, and we hope others find the script useful for their projects. If you have any further insights or updates, please feel free to share with the community.

Thank you for your contribution!

ultralytics / yolov5