Segmentation: bounding box larger than corresponding mask

atmilatos commented 1 year ago

Search before asking

[X] I have searched the YOLOv8 issues and found no similar bug report.

YOLOv8 Component

Predict

Bug

Hello,

I am using YOLOv8 for segmentation purposes. I have found that in some cases the resulting bounding box is larger than the corresponding mask. The bounding box result is the correct one, while the mask is missing some data. I have validated this by looking at the saved image after the prediction, as well as the crops written by predict. The mask polygons (xyxy) also correspond to the (incorrect) masks.

I have written all three in per-mask images, and I have attached one of them.

mask_11

Is there something I am missing regarding the relationship between the mask and the bounding box?

Thank you in advance.

Environment

Ultralytics YOLOv8.0.157 Python-3.11.6 torch-2.1.0.dev20230722+cu121 CUDA:0 (NVIDIA GeForce RTX 4090, 24564MiB)

Minimal Reproducible Example

Here is a code sample that produces the error.

import numpy as np

import cv2

from ultralytics import YOLO

def infer(model_path, source_img_path, inf_base_path, conf_th, img_sz):

    model = YOLO(model_path)
    cls_names = model.names

    results = model.predict(source_img_path, project = 'test', name = 'MASK', save = True, save_crop = True, imgsz = img_sz, conf = conf_th, boxes = True, show_labels = True, line_width = 3, retina_masks = True)

    for result in results:
        img = cv2.imread(result.path)

        h, w, c = img.shape

        if result.masks is None:
            print("\tNOTHING DETECTED, SKIPPING IMAGE")

            continue

        boxes = result.cpu().boxes
        masks = result.cpu().masks

        for obj_i, mask in enumerate(masks):

            cls = cls_names[boxes[obj_i].cls.item()]

            conf = boxes[obj_i].conf[0]

            print(f'\tMASK no {obj_i}')
            print(f'\t\tCLASS: {cls}, CONF: {conf}')

            mxy = masks[obj_i].xy
            md = masks[obj_i].data.numpy()

            md = np.squeeze(md, axis = 0)
            #md = scale_image(md, (h, w))
            #md = cv2.resize(md, (w, h))
            md = np.uint8(md)

            md_normalized = np.uint8(md * 255)

            pts = np.array(mxy, np.int32)
            pts = pts.reshape((-1,1,2))
            cv2.polylines(md_normalized,[pts],True,(255,255,255))

            bb = boxes[obj_i].xyxy.numpy()[0]
            cv2.rectangle(md_normalized, (int(bb[0]), int(bb[1])), (int(bb[2]), int(bb[3])), (255, 255, 255), 3)

            cv2.imwrite(f'./tmp/res/mask_{obj_i}.JPG', md_normalized)

if __name__ == '__main__':

    infer('./runs/segment/ALL_CLOSE_UP/weights/best.pt', './tmp/test', './tmp/res', 0.25, 1024)

Additional

No response

Are you willing to submit a PR?

[ ] Yes I'd like to help by submitting a PR!

github-actions[bot] commented 1 year ago

👋 Hello @atmilatos, thank you for your interest in YOLOv8 🚀! We recommend a visit to the YOLOv8 Docs for new users where you can find many Python and CLI usage examples and where many of the most common questions may already be answered.

If this is a 🐛 Bug Report, please provide a minimum reproducible example to help us debug it.

If this is a custom training ❓ Question, please provide as much information as possible, including dataset image examples and training logs, and verify you are following our Tips for Best Training Results.

Join the vibrant Ultralytics Discord 🎧 community for real-time conversations and collaborations. This platform offers a perfect space to inquire, showcase your work, and connect with fellow Ultralytics users.

Install

Pip install the ultralytics package including all requirements in a Python>=3.8 environment with PyTorch>=1.8.

pip install ultralytics

Environments

YOLOv8 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Notebooks with free GPU:
Google Cloud Deep Learning VM. See GCP Quickstart Guide
Amazon Deep Learning AMI. See AWS Quickstart Guide
Docker Image. See Docker Quickstart Guide

Status

If this badge is green, all Ultralytics CI tests are currently passing. CI tests verify correct operation of all YOLOv8 Modes and Tasks on macOS, Windows, and Ubuntu every 24 hours and on every commit.

atmilatos commented 1 year ago

Screenshot 2023-10-30 105457 Here is another, clearer, example. The bottom insulator has the correct bb, YOLO has cropped the correct image part, but its mask is "masked" by another object, hence it's not returned.

glenn-jocher commented 1 year ago

@atmilatos box and mask predictions are completely separate, they may or may not align. The box primarily serves to crop mask predictions.

In the case of overlapping objects prediction results in general may perform more poorly.

If you are interested in improved masks at the expense of some speed you can use retina_masks=True

atmilatos commented 1 year ago

Thank you for the reply (and sorry for the post label error).

I already use retina masks.

It is strange that the image produced by predict has all the mask pixels but the actual returned mask object does not.

Do you think that if I passed each bounding box overlaid at its correct coordinates over a blank image again into predict that I could get the "actual" masks?

glenn-jocher commented 1 year ago

@atmilatos generally, box predictions and mask predictions are dissociated processes in YOLOv8, which means they might not always align. The bounding box primarily functions to crop the mask predictions. However, when you have overlapping objects, predictions might perform more poorly.

You mentioned that predict produces an image with all the mask pixels even though the returned mask object doesn't. This discrepancy is because the visualization process uses the thresholded mask on the crop defined by the bounding box. The crop part is supposed to have all the mask pixels.

About predicting on each bounding box overlaid onto a blank image: this could potentially work in theory, especially if the objects are easily separable and within the boundaries of the image. But it's not guaranteed to improve the results, because the nature of how YOLOv8 performs detection, which is highly dependent on the context of the surrounding image area.

These are inherent limitations in YOLOv8 and similar architectures, and while we continue to improve these aspects in ongoing research, some of these issues might still be present. We appreciate your understanding and patience.

atmilatos commented 1 year ago

@glenn-jocher thank you for the response.

I have been exploring the possibility to run predict and get the masks, then isolate each of them (via blurring/covering the rest of the masks), then re-running the prediction, and finally combining the masks for each object. So far the results are promising but I have to validate more thoroughly.

glenn-jocher commented 1 year ago

@atmilatos, it's great to hear you're exploring innovative workarounds! The iterative approach you've described sounds clever—it essentially involves masking out detected objects and re-running detection for possibly occluded items. It aligns with some strategies used in instance segmentation tasks to handle occlusions.

There are, however, a few considerations for this method:

Processing time might increase due to multiple predictions.
Ensuring that covering or blurring other masks doesn't inadvertently modify the context in a way that might lead to poorer predictions.
There may be a trade-off between isolating individual masks and maintaining enough of the original scene context for accurate re-prediction.

If your results are promising, it might be indeed a viable strategy for your use case. Keep in mind that edge cases and variations in object appearances under different conditions could impact the effectiveness of this method. Continuous validation over diverse datasets will be key.

We are always delighted to see community members push the limits of what's possible and find new ways to optimize their workflows. Your feedback and experience could be very valuable for others facing similar challenges. Keep up the excellent work!

github-actions[bot] commented 11 months ago

👋 Hello there! We wanted to give you a friendly reminder that this issue has not had any recent activity and may be closed soon, but don't worry - you can always reopen it if needed. If you still have any questions or concerns, please feel free to let us know how we can help.

For additional resources and information, please see the links below:

Docs: https://docs.ultralytics.com
HUB: https://hub.ultralytics.com
Community: https://community.ultralytics.com

Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!

Thank you for your contributions to YOLO 🚀 and Vision AI ⭐

WLi0777 commented 10 months ago

@atmilatos box and mask predictions are completely separate, they may or may not align. The box primarily serves to crop mask predictions.

In the case of overlapping objects prediction results in general may perform more poorly.

If you are interested in improved masks at the expense of some speed you can use retina_masks=True

Hi @glenn-jocher, if I don't want the box to crop mask predictions, what should I do?

glenn-jocher commented 10 months ago

Hi @WLi0777, if you prefer not to have the bounding box crop the mask predictions, you might consider adjusting the confidence and IoU thresholds to fine-tune the predictions. However, the current architecture of YOLOv8 inherently uses bounding boxes to crop masks. For now, there isn't a built-in option to disable this behavior. Your approach of re-running predictions on isolated regions is a creative solution, and we encourage you to continue exploring such methods.

ultralytics / ultralytics