voxel51 / fiftyone

Refine high-quality datasets and visual AI models
https://fiftyone.ai
Apache License 2.0
8.88k stars 563 forks source link

[BUG] No image samples are detected with Dataset loading and conversion #1214

Closed neel04 closed 3 years ago

neel04 commented 3 years ago

System information

Code to reproduce issue

I am looking to convert my FiftyOneType dataset to COCOdataset, however it apparently does not detect any image samples.

This is the JSON that I have created for the FiftyOneDataset Type (Rename to JSON)==> result.txt

you can reproduce with provided file structure and any random images of your choice in the data/ subdir, modifying the JSON accordingly - I have provided the full file so that any mistake on my part can be caught early.

FiftyOneData/
    data/
        ID_00G8K1V3.jpg
        ....
    result.json

--> The issue is that despite the seemingly correct JSON and data provided, FiftyOne doesn't detect any samples at all.

Other info / logs

Output:-

 100% |█████████████████████| 0/0 [803.1us elapsed, ? remaining, ? samples/s] 
Name:        my-dataset
Media type:  None
Num samples: 0
Persistent:  False
Tags:        []
Sample fields:
    id:           fiftyone.core.fields.ObjectIdField
    filepath:     fiftyone.core.fields.StringField
    tags:         fiftyone.core.fields.ListField(fiftyone.core.fields.StringField)
    metadata:     fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.metadata.Metadata)
    ground_truth: fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Detections)
[]

As you can see, Num_samples is 0

EDIT:- BTW the code I used to load the dataset is taken straight from the docs, but for debugging's sake

import fiftyone as fo
name = "my-dataset"
dataset_dir = "./FiftyOneDataset"

# Create the dataset
dataset = fo.Dataset.from_dir(
    dataset_dir=dataset_dir,
    dataset_type=fo.types.FiftyOneImageDetectionDataset,
    name=name,
)

# View summary info about the dataset
print(dataset)

# Print the first few samples in the dataset
print(dataset.head())

What areas of FiftyOne does this bug affect?

Willingness to contribute

The FiftyOne Community encourages bug fix contributions. Would you or another member of your organization be willing to contribute a fix for this bug to the FiftyOne codebase?

brimoor commented 3 years ago

Hi @neel04 :wave:

When you use this syntax:

dataset = fo.Dataset.from_dir(
    dataset_dir=dataset_dir,
    dataset_type=fo.types.FiftyOneImageDetectionDataset,
)

The dataset needs to be organized exactly as documented here. In particular, this means that the JSON file needs to be named labels.json, not result.json.

Alternatively, you can provide your custom name like so:

dataset = fo.Dataset.from_dir(
    dataset_dir=dataset_dir,
    dataset_type=fo.types.FiftyOneImageDetectionDataset,
    labels_path="result.json",
)

or like this:

dataset = fo.Dataset.from_dir(
    data_path="/path/to/images",
    labels_path="/path/to/results.json",
    dataset_type=fo.types.FiftyOneImageDetectionDataset,
)
neel04 commented 3 years ago

@brimoor Really appreciate the prompt reply, I feel like such an idiot :sweat_smile: Thanks a lot for pointing my mistake! :hugs: :+1:

I wonder if a FR might be opened to address this pretty simple issue, since I believe most datasets would have a single JSON file to automatically search for the correct one.

A more lazier approach might simply be to error out that labels.json was not found :stuck_out_tongue:

Again, many thanks and have a wonderful day! :+1:

brimoor commented 3 years ago

Yeah good point, we should definitely at least raise an informative error explaining that the JSON file wasn't located.

In the meantime, glad your issue is solved 👍

neel04 commented 3 years ago

@brimoor Just another thing - supposing if each of our image samples has multiple annotations, how are we supposed to synthesize the labels.json in that case? Currently, I am using multiple UUID keys setting it the same for the images with multiple bboxes and specifying the different co-ordinate values.

I am not sure if this is the right approach, hence I wanted to confirm with you :hugs:

brimoor commented 3 years ago

Are you asking how to handle a sample has >1 object detections in it?

Here's a snippet that generates a valid dataset in FiftyOneImageDetectionDataset format:

import fiftyone as fo
import fiftyone.zoo as foz

dataset = foz.load_zoo_dataset("quickstart")
dataset.limit(1).export(
    export_dir="/tmp/test",
    dataset_type=fo.types.FiftyOneImageDetectionDataset,
    label_field="ground_truth",
)

The contents of /tmp/test/labels.json is:

{
    "classes": null,
    "labels": {
        "000880": [
            {
                "label": "bird",
                "bounding_box": [
                    0.21084375,
                    0.0034375,
                    0.46190625,
                    0.9442083333333334
                ],
                "attributes": {
                    "area": 73790.37944999996,
                    "iscrowd": 0.0
                }
            },
            {
                "label": "bird",
                "bounding_box": [
                    0.74946875,
                    0.489375,
                    0.2164375,
                    0.23183333333333334
                ],
                "attributes": {
                    "area": 3935.7593000000006,
                    "iscrowd": 0.0
                }
            },
            {
                "label": "bird",
                "bounding_box": [
                    0.044234375,
                    0.5282083333333333,
                    0.151390625,
                    0.14145833333333335
                ],
                "attributes": {
                    "area": 4827.32605,
                    "iscrowd": 0.0
                }
            }
        ]
    }
}
brimoor commented 3 years ago

Note that you don't have to use FiftyOneImageDetectionDataset format to load data into FiftyOne. You can just write a Python loop if you prefer: https://voxel51.com/docs/fiftyone/user_guide/dataset_creation/index.html#custom-formats

neel04 commented 3 years ago

The custom python implementation looks much faster and cleaner to me than the - thanks for pointing that out!

However, even if I use a custom loop how can we add multiple annotations for the same uuid?

sample["ground_truth"] = fo.Detections(detections=detections)

can detections be a nested list? checking the docs for fo.Detections doesn't hint towards the possibility of multiple annotations or a nested list...

brimoor commented 3 years ago

I'm not sure I understand your question. You can do this, for example:


import fiftyone as fo

sample = fo.Sample(
    filepath="image.png",
    ground_truth=fo.Detections(
        detections=[
            fo.Detection(
                label="cat",
                bounding_box=[0, 0, 1, 1],
                age=3,
                breed="tabby",
            ),
            fo.Detection(
                label="dog",
                bounding_box=[0, 0, 1, 1],
                age=5,
                breed="terrier",
            ),
        ]
    )
)
print(sample)
neel04 commented 3 years ago

that does seem to work - thanks again for the example!