ultralytics masks to yolo seg convertor not working

PythoneerSamurai commented 1 month ago

Search before asking

[X] I have searched the Ultralytics YOLO issues and found no similar bug report.

Ultralytics YOLO Component

No response

Bug

I have only one binary mask from the DeepGlobe road extraction dataset. It has only one class "road". However when I use the ultralytics convertor to convert from masks to yolo-seg format, it skips it saying that the class is unknown for pixel value 255, even though I specified the class number. Following is the image of my mask: 803431_mask

I used a script from github to convert the mask to yolo format and it worked, following is the output: 803431_mask.txt

Kindly fix this issue, also I tried to convert the entire deepglobe road extraction dataset (on kaggle) masks to yolo with the convertor but it kept skipping files.

Environment

Ultralytics YOLOv8.2.75 🚀 Python-3.10.13 torch-2.1.2+cpu CPU (Intel Xeon 2.20GHz) Setup complete ✅ (4 CPUs, 31.4 GB RAM, 5771.7/8062.4 GB disk)

Minimal Reproducible Example

None

Additional

No response

Are you willing to submit a PR?

[ ] Yes I'd like to help by submitting a PR!

github-actions[bot] commented 1 month ago

👋 Hello @PythoneerSamurai, thank you for your interest in Ultralytics YOLOv8 🚀! We recommend a visit to the Docs for new users where you can find many Python and CLI usage examples and where many of the most common questions may already be answered.

If this is a 🐛 Bug Report, please provide a minimum reproducible example to help us debug it.

If this is a custom training ❓ Question, please provide as much information as possible, including dataset image examples and training logs, and verify you are following our Tips for Best Training Results.

Join the vibrant Ultralytics Discord 🎧 community for real-time conversations and collaborations. This platform offers a perfect space to inquire, showcase your work, and connect with fellow Ultralytics users.

Install

Pip install the ultralytics package including all requirements in a Python>=3.8 environment with PyTorch>=1.8.

pip install ultralytics

Environments

YOLOv8 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Notebooks with free GPU:
Google Cloud Deep Learning VM. See GCP Quickstart Guide
Amazon Deep Learning AMI. See AWS Quickstart Guide
Docker Image. See Docker Quickstart Guide

Status

If this badge is green, all Ultralytics CI tests are currently passing. CI tests verify correct operation of all YOLOv8 Modes and Tasks on macOS, Windows, and Ubuntu every 24 hours and on every commit.

PythoneerSamurai commented 1 month ago

@PythoneerSamurai thank you for reporting this issue. It seems like the converter is not recognizing the class for the pixel value 255. Could you please ensure that you are using the latest version of the Ultralytics package? Sometimes, updates include bug fixes that might resolve your issue.

If the problem persists, it would be helpful to know the exact command you used for the conversion. This will allow us to better understand the context and provide more accurate assistance. Additionally, you might want to check if the class mapping is correctly specified in your configuration.

We appreciate your patience and understanding. If you have any further details or updates, please share them here.

Greetings. Thankyou for your reply. Following is the code I used to convert from masks to yolo segmentation:

import ultralytics masks_dir = "/home/haroon/Desktop/mask_dir" output_dir = "/home/haroon/Desktop/mask_dir" classes = 1 ultralytics.data.converter.convert_segment_masks_to_yolo_seg(masks_dir, output_dir, classes)

I think that I might be making a mistake in specifying the class names, kindly correct me. Moreover I am using the latest version of ultralytics. Also what do you mean by correct class mapping specification in configuration.

PythoneerSamurai commented 1 month ago

Okay so I found the issue. The issue is that the convert_segment_masks_to_yolo_seg function, as defined in your ultralytics converter.py file is coded the wrong way. The issue is that the number of classes is hard coded in the function. Following is how the classes are assigned in the function: pixel_to_class_mapping = {i + 1: i for i in range(80)} This is of course wrong, because the range has been hard coded for the coco dataset. My dataset only has one class "road" with a pixel value of 255. This line of code doesn't even consider the 255th pixel value, and if we adjust the range it will just make 255 classes, which is not what we want. Moreover the classes local variable is not even used in this functions definition. Now what you should do is change the function's definition. What I assume would work correctly is that rather than setting the pixel values to classes at the beginning of the function, you should do that after getting the unique pixel values from the mask image, as you do in your function in the following line of code: unique_values = np.unique(mask) Now because the user is giving us the number of classes we can use the following code to get the class mappings: pixel_to_class_mapping = {pixel_value: class_id for pixel_value, class_id in zip(unique_values.tolist(), range(1, classes)} Kindly check if this fixes the implementation.

PythoneerSamurai commented 1 month ago

Okay so I found the issue. The issue is that the convert_segment_masks_to_yolo_seg function, as defined in your ultralytics converter.py file is coded the wrong way. The issue is that the number of classes is hard coded in the function. Following is how the classes are assigned in the function: pixel_to_class_mapping = {i + 1: i for i in range(80)} This is of course wrong, because the range has been hard coded for the coco dataset. My dataset only has one class "road" with a pixel value of 255. This line of code doesn't even consider the 255th pixel value, and if we adjust the range it will just make 255 classes, which is not what we want. Moreover the classes local variable is not even used in this functions definition. Now what you should do is change the function's definition. What I assume would work correctly is that rather than setting the pixel values to classes at the beginning of the function, you should do that after getting the unique pixel values from the mask image, as you do in your function in the following line of code: unique_values = np.unique(mask) Now because the user is giving us the number of classes we can use the following code to get the class mappings: pixel_to_class_mapping = {pixel_value: class_id for pixel_value, class_id in zip(unique_values.tolist(), range(1, classes)} Kindly check if this fixes the implementation.

So I added the following code snippet in the function's defintion, and it seems that we don't need the classes variable at all: for index, value in enumerate(unique_values): if value == 0: continue else: pixel_to_class_mapping[value] = index

PythoneerSamurai commented 1 month ago

Okay so I found the issue. The issue is that the convert_segment_masks_to_yolo_seg function, as defined in your ultralytics converter.py file is coded the wrong way. The issue is that the number of classes is hard coded in the function. Following is how the classes are assigned in the function: pixel_to_class_mapping = {i + 1: i for i in range(80)} This is of course wrong, because the range has been hard coded for the coco dataset. My dataset only has one class "road" with a pixel value of 255. This line of code doesn't even consider the 255th pixel value, and if we adjust the range it will just make 255 classes, which is not what we want. Moreover the classes local variable is not even used in this functions definition. Now what you should do is change the function's definition. What I assume would work correctly is that rather than setting the pixel values to classes at the beginning of the function, you should do that after getting the unique pixel values from the mask image, as you do in your function in the following line of code: unique_values = np.unique(mask) Now because the user is giving us the number of classes we can use the following code to get the class mappings: pixel_to_class_mapping = {pixel_value: class_id for pixel_value, class_id in zip(unique_values.tolist(), range(1, classes)} Kindly check if this fixes the implementation.

So I added the following code snippet in the function's defintion, and it seems that we don't need the classes variable at all: for index, value in enumerate(unique_values): if value == 0: continue else: pixel_to_class_mapping[value] = index

At the end of the function we can just print the total number of found classes, so that the user can setup his config.yaml file correctly: print(f"No. of classes: {len(pixel_to_class_mapping.keys())}")

PythoneerSamurai commented 1 month ago

Following is the final updated function with some additional fixes:

def convert_segment_masks_to_yolo_seg(masks_dir, output_dir):
    """
    Converts a dataset of segmentation mask images to the YOLO segmentation format.

    This function takes the directory containing the binary format mask images and converts them into YOLO segmentation format.
    The converted masks are saved in the specified output directory.

    Args:
        masks_dir (str): The path to the directory where all mask images (png, jpg) are stored.
        output_dir (str): The path to the directory where the converted YOLO segmentation masks will be stored.
        classes (int): Total classes in the dataset i.e for COCO classes=80

    Example:
        ```python
        from ultralytics.data.converter import convert_segment_masks_to_yolo_seg

        # for coco dataset, we have 80 classes
        convert_segment_masks_to_yolo_seg('path/to/masks_directory', 'path/to/output/directory', classes=80)

Notes:
    The expected directory structure for the masks is:

        - masks
            ├─ mask_image_01.png or mask_image_01.jpg
            ├─ mask_image_02.png or mask_image_02.jpg
            ├─ mask_image_03.png or mask_image_03.jpg
            └─ mask_image_04.png or mask_image_04.jpg

    After execution, the labels will be organized in the following structure:

        - output_dir
            ├─ mask_yolo_01.txt
            ├─ mask_yolo_02.txt
            ├─ mask_yolo_03.txt
            └─ mask_yolo_04.txt
"""
import os

pixel_to_class_mapping = {}
class_list = []
for mask_filename in os.listdir(masks_dir):
    if mask_filename.endswith(".png"):
        mask_path = os.path.join(masks_dir, mask_filename)
        mask = cv2.imread(mask_path, cv2.IMREAD_GRAYSCALE)  # Read the mask image in grayscale
        img_height, img_width = mask.shape  # Get image dimensions
        LOGGER.info(f"Processing {mask_path} imgsz = {img_height} x {img_width}")

        unique_values = np.unique(mask)  # Get unique pixel values representing different classes
        yolo_format_data = []

        for value in unique_values:
            if value == 0:
                continue
            else:
                if value in pixel_to_class_mapping.keys():
                    pass
                else:
                    if len(class_list) != 0:
                        pixel_to_class_mapping[value] = class_list[-1] + 1
                    else:
                        pixel_to_class_mapping[value] = 0
                    class_list.append(pixel_to_class_mapping[value])

        for value in unique_values:
            if value == 0:
                continue  # Skip background
            class_index = pixel_to_class_mapping.get(value, -1)
            if class_index == -1:
                LOGGER.warning(f"Unknown class for pixel value {value} in file {mask_filename}, skipping.")
                continue

            # Create a binary mask for the current class and find contours
            contours, _ = cv2.findContours(
                (mask == value).astype(np.uint8), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE
            )  # Find contours

            for contour in contours:
                if len(contour) >= 3:  # YOLO requires at least 3 points for a valid segmentation
                    contour = contour.squeeze()  # Remove single-dimensional entries
                    yolo_format = [class_index]
                    for point in contour:
                        # Normalize the coordinates
                        yolo_format.append(round(point[0] / img_width, 6))  # Rounding to 6 decimal places
                        yolo_format.append(round(point[1] / img_height, 6))
                    yolo_format_data.append(yolo_format)
        # Save Ultralytics YOLO format data to file
        output_path = os.path.join(output_dir, os.path.splitext(mask_filename)[0] + ".txt")
        with open(output_path, "w") as file:
            for item in yolo_format_data:
                line = " ".join(map(str, item))
                file.write(line + "\n")
        LOGGER.info(f"Processed and stored at {output_path} imgsz = {img_height} x {img_width}")
print(f"No. of classes: {len(class_list)}")
print(f"Pixel Values: {list(pixel_to_class_mapping.keys())}")
print(f"Class Names: {class_list}")

RizwanMunawar commented 1 month ago

@PythoneerSamurai Thank you for sharing the updated code. You are correct that pixel_to_class_mapping = {i + 1: i for i in range(80)} is specifically designed for the COCO dataset. I will update this as soon as possible.

However, the changes you proposed approach the data a bit differently. For example, if I have 80 classes and a mask file containing only a single class with class_id 10, your implementation would interpret it as class_id 0 because there is only one mask. So, this workflow may not work for other use cases.

The original code is functioning correctly; however, the classes=80 value needs to be replaced with the user-provided number of classes. In your case, it seems that your dataset contains 255 classes, and the image you provided corresponds to class 255, which represents the 'road.' The original code will work with your image. Below is the modified code that will work for your specific use case."

import ultralytics 
masks_dir = "/home/haroon/Desktop/mask_dir" 
output_dir = "/home/haroon/Desktop/mask_dir" 
classes = 255 
ultralytics.data.converter.convert_segment_masks_to_yolo_seg(masks_dir, output_dir, classes)

I've attached the segmentation file (Ultralytics YOLO format) corresponding to the mask you shared above.

Ultralytics YOLO Format.txt

We appreciate your valuable input. Thank you, Ultralytics Team!

PythoneerSamurai commented 1 month ago

Thankyou for your reply! Yes you are right that my code doesn't take into account the class-id that the user wants to assign to a specific class, it automatically assigns class-ids based upon the pixel intensities it sees first. Thankyou for correcting my mistake. As far as my dataset is concerned, what I meant was that my dataset has only one class (class-id = 0), and in the masks its is represented by the pixel intensity 255. Thankyou for your support, this issue is resolved. All that's needed to be done is updating the converter.py via a new release of ultralytics.

PythoneerSamurai commented 1 month ago

@RizwanMunawar I think that there will be an issue in your code when I pass it a mask which has only one class, and it's represented by pixel value 255. Because your mapping the pixel values using range in the dictionary. If I pass classes = 1, Your dictionary will map the pixel value 1 to class index 0, which is wrong. Kindly recheck your code with the same mask I shared but pass classes = 1 to the function.

PythoneerSamurai commented 1 month ago

@RizwanMunawar I think that there will be an issue in your code when I pass it a mask which has only one class, and it's represented by pixel value 255. Because your mapping the pixel values using range in the dictionary. If I pass classes = 1, Your dictionary will map the pixel value 1 to class index 0, which is wrong. Kindly recheck your code with the same mask I shared but pass classes = 1 to the function.

What I mean is that if I only have one class in my entire dataset, represented by pixel value 255, the dictionary will not map 255 to class-id 0, rather it would map 1 to class-id 0, resultantly the same issue will arise as mentioned in my first post. That's the thing my code handles, it automatically detects all classes, and then notifies the class-id and pixel-values to the users so that they can understand which pixel value represents which class.

ultralytics / ultralytics