tensorflow / datasets

TFDS is a collection of datasets ready to use with TensorFlow, Jax, ...
https://www.tensorflow.org/datasets
Apache License 2.0
4.31k stars 1.54k forks source link

[data request] OpenImages v7 #906

Open rodrigob opened 5 years ago

rodrigob commented 5 years ago

Name of dataset: OpenImages v7 URL of dataset: https://g.co/dataset/open-images License of dataset: licensed by Google Inc. under CC BY 4.0 license. The images are listed as having a CC BY 2.0 license.

Short description of dataset and use case(s): bigger than ImageNet with 61M image level labels, 16M bounding boxes, 3M visual relationships, 2.7M instance segmentation masks, 600k localized narratives (synchronized audio and text caption, with mouse trace), and 66M point labels.

Folks who would also like to see this dataset in tensorflow/datasets, please thumbs-up so the developers can know which requests to prioritize.

And if you'd like to contribute the dataset (thank you!), see our guide to adding a dataset.

pierrot0 commented 5 years ago

aman2930 is looking into this.

aman2930 commented 5 years ago

Could you please assign it to me?

rodrigob commented 5 years ago

any update on this ?

rodrigob commented 4 years ago

For info we are now at open_images_v6 (same image labels, boxes, masks, and images as v5, but new types of annotations added, and larger number of relation annotations).

Conchylicultor commented 4 years ago

Nice, we would love have this!

For info, we (TFDS team) ensure the core API support and help with issues, but we let the community (both internal and external) implement the datasets they want (we have 130+ dataset requests).

Don't hesitate to help us with this. Or if anyone else is interested to work on this, don't hesitate to send a PR. By starting from open_images_v4, it should be relatively straightforward to add a OpenImagesV6: https://github.com/tensorflow/datasets/blob/master/tensorflow_datasets/object_detection/open_images.py We're here to help if anyone encounter issues for this.

rodrigob commented 4 years ago

relatively straightforward

Not so much, since new data types / data conventions are needed. (instance segmentation, localized captions, audio)

@jponttuset FYI.

Eshan-Agarwal commented 4 years ago

@Conchylicultor I want to work on it , should we keep both v4 and v6 ?

rodrigob commented 4 years ago

Note also that there was a potential bug in v4 tfds import (in the quantization of the image level machine scores), so v5/v6 should be implemented with care (and probably consider removing the quantization). Please add me in the reviewers pool.

Conchylicultor commented 4 years ago

@Eshan-Agarwal, yes we should keep both v4 and v6. However I feel this one may be a little too ambitious for you, especially if you don't have enough compute power.

Eshan-Agarwal commented 4 years ago

Yes as open_images dataset have huge size but I will try.

rodrigob commented 3 years ago

For info, I am currently working on this issue.

BlackHC commented 2 years ago

Any updates on this? 🤗 This would be super useful to have

rodrigob commented 2 years ago

Any updates on this? 🤗 This would be super useful to have

For context, a not-yet released implementation exists. It was used to generate the new Open Image visualizers. I will be spending the next couple of weeks cleaning the code and pushing the public release.

joaoguilhermeS commented 1 year ago

any updates on this :? I guess it would optimize a lot the work for a beginner.

whoschek commented 1 year ago

Would invite so much more use and experimentation!