voxel51 / fiftyone

Refine high-quality datasets and visual AI models
https://fiftyone.ai
Apache License 2.0
8.88k stars 563 forks source link

[FR] Is there a way or can we have one to skip labels marked has difficult in VOC format datasets while exporting it? #2334

Closed Mahmood-Hussain closed 1 year ago

Mahmood-Hussain commented 1 year ago

Proposal Summary

I am trying to export a dataset from PASCAL VOC format to COCODetection or YOLOv5Detection. In XML files of VOC labels I have some annotations marked as tricky is there a way to skip those annotations (bounding boxes) while exporting to other formats?

Motivation

What areas of FiftyOne does this feature affect?

Details

Use this section to include any additional information about the feature. If you have a proposal for how to implement this feature, please include it here. here is an example of XML file

<annotation verified="no">
    <folder>haze_detection</folder>
    <filename>AM_Bing_217.png</filename>
    <size>
        <width>990</width>
        <height>576</height>
        <depth>3</depth>
    </size>
    <segmented>0</segmented>
    <object>
        <name>person</name>
        <pose>Unspecified</pose>
        <truncated>0</truncated>
        <difficult>0</difficult>
        <inferred>0</inferred>
        <bndbox>
            <xmin>445</xmin>
            <ymin>188</ymin>
            <xmax>515</xmax>
            <ymax>358</ymax>
        </bndbox>
    </object>
    <object>
        <name>person</name>
        <pose>Unspecified</pose>
        <truncated>0</truncated>
        <difficult>0</difficult>
        <inferred>0</inferred>
        <bndbox>
            <xmin>494</xmin>
            <ymin>192</ymin>
            <xmax>563</xmax>
            <ymax>364</ymax>
        </bndbox>
    </object>
</annotation>

Willingness to contribute

The FiftyOne Community welcomes contributions! Would you or another member of your organization be willing to contribute an implementation of this feature?

brimoor commented 1 year ago

Hi @Mahmood-Hussain 👋

This kind of filtering is easy to achieve after you've already loaded the data using dataset views:

For example, if you wanted to only export labels for which inferred == 0 in COCO format:

import fiftyone as fo
from fiftyone import ViewField as F

dataset = fo.Dataset.from_dir(
    dataset_dir="/path/to/voc/data",
    dataset_type=fo.types.VOCDetectionDataset,
    label_field="ground_truth",
)

# See what dataset contains
print(dataset)
print(dataset.count_values("ground_truth.detections.label"))
print(dataset.count_values("ground_truth.detections.inferred"))

view = dataset.filter_labels("ground_truth", F("inferred") == 0)

# See what view contains
print(view)
print(view.count_values("ground_truth.detections.label"))
print(view.count_values("ground_truth.detections.inferred"))

view.export(
    export_dir="/path/for/coco",
    dataset_type=fo.types.COCODetectionDataset,
    label_field="ground_truth",
)