voxel51 / fiftyone

The open-source tool for building high-quality datasets and computer vision models
https://fiftyone.ai
Apache License 2.0
8.06k stars 537 forks source link

[BUG] New to this program. How does "classes" argument work on COCO dataset? #4570

Open DWSuryo opened 1 month ago

DWSuryo commented 1 month ago

Describe the problem

Hello. I'm new at this program so I think I would explore a bit. I want to use COCO 2017 dataset, but with at least has "person" class in each dataset so my goal is to make and export into COCO with person dataset. When I run the code, I thought the classes argument filters as intended, but it resulted with 0 samples. Currently I'm using validation dataset first before doing export for person only COCO dataset

Code to reproduce issue

Provide a reproducible test case that is the bare minimum necessary to generate the problem. Please avoid sharing code that relies on your local data or datasets. Include a short video or screenshot if the bug is in the App.

import fiftyone as fo
import fiftyone.zoo as foz

# Load the COCO-2017 validation split with only the "person" class
dataset = foz.load_zoo_dataset(
    "coco-2017",
    dataset_dir='./dataset',
    split="validation",
    label_types=["detections"],
    classes=["person"],
)
# Launch the FiftyOne app to visualize the dataset
session = fo.launch_app(dataset)

Output

Downloading split 'validation' to './dataset/validation' if necessary
INFO:fiftyone.zoo.datasets:Downloading split 'validation' to './dataset/validation' if necessary
Downloading annotations to './dataset/tmp-download/annotations_trainval2017.zip'
INFO:fiftyone.utils.coco:Downloading annotations to './dataset/tmp-download/annotations_trainval2017.zip'
 100% |██████|    1.9Gb/1.9Gb [2.5s elapsed, 0s remaining, 761.6Mb/s]      
INFO:eta.core.utils: 100% |██████|    1.9Gb/1.9Gb [2.5s elapsed, 0s remaining, 761.6Mb/s]      
Extracting annotations to 'dataset/raw/instances_val2017.json'
INFO:fiftyone.utils.coco:Extracting annotations to 'dataset/raw/instances_val2017.json'
Writing annotations for 0 downloaded samples to './dataset/validation/labels.json'
INFO:fiftyone.utils.coco:Writing annotations for 0 downloaded samples to './dataset/validation/labels.json'
Dataset info written to './dataset/info.json'
INFO:fiftyone.zoo.datasets:Dataset info written to './dataset/info.json'
Loading existing dataset 'coco-2017-validation'. To reload from disk, either delete the existing dataset or provide a custom `dataset_name` to use
INFO:fiftyone.zoo.datasets:Loading existing dataset 'coco-2017-validation'. To reload from disk, either delete the existing dataset or provide a custom `dataset_name` to use

Check dataset details:

print(dataset)

Output:

Name:        coco-2017-validation
Media type:  None
Num samples: 0
Persistent:  False
Tags:        []
Sample fields:
    id:           fiftyone.core.fields.ObjectIdField
    filepath:     fiftyone.core.fields.StringField
    tags:         fiftyone.core.fields.ListField(fiftyone.core.fields.StringField)
    metadata:     fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.metadata.Metadata)
    ground_truth: fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Detections)

This is using validation only COCO dataset. Since the result is 0 samples, I wonder about how the filtering works in the program. Does the program handle "person" like samples with only "person" class or am I missing something here? I look again at the documentation for COCO integration, there is "only_matching" argument which interests me. However, using either True or False value still results 0 samples. What should I do?

System information

Willingness to contribute

The FiftyOne Community encourages bug fix contributions. Would you or another member of your organization be willing to contribute a fix for this bug to the FiftyOne codebase?

DWSuryo commented 1 month ago

Update: the question still remains about how the class filtering works, currently the workaround I use is this:

import fiftyone as fo
import fiftyone.zoo as foz

# Load the COCO-2017 train split
train_dataset = foz.load_zoo_dataset("coco-2017", split="train")
# Filter the train dataset to include only samples with the "person" class
train_view = train_dataset.filter_labels("ground_truth", fo.ViewField("label") == "person")

# Load the COCO-2017 validation split
val_dataset = foz.load_zoo_dataset("coco-2017", split="validation")
# Filter the validation dataset to include only samples with the "person" class
val_view = val_dataset.filter_labels("ground_truth", fo.ViewField("label") == "person")

# Launch the FiftyOne app to visualize the filtered datasets
session = fo.launch_app(train_view)

# Define the export directories
train_export_dir = "/path/to/export/train"
val_export_dir = "/path/to/export/validation"

# Export the filtered train dataset in COCO format
train_view.export(
    export_dir=train_export_dir,
    dataset_type=fo.types.COCODetectionDataset,
    label_field="ground_truth"
)

# Export the filtered validation dataset in COCO format
val_view.export(
    export_dir=val_export_dir,
    dataset_type=fo.types.COCODetectionDataset,
    label_field="ground_truth"
)

However, with this method, I downloaded all the coco-2017 train and validation dataset, then I filter based on the "person" class label in a "ground_truth". I wonder if there is an efficient way so I can download the images with at least a "person" class in the image