kakaobrain / coyo-dataset

COYO-700M: Large-scale Image-Text Pair Dataset
https://kakaobrain.com/contents?contentId=7eca73e3-3089-43cb-b701-332e8a1743fd
1.16k stars 36 forks source link

where can I find COYO-Labels-300M? #8

Open Soonhwan-Kwon opened 2 years ago

Soonhwan-Kwon commented 2 years ago

First of all, thank you for the great dataset.

We also provide COYO-Labels-300M by adding machine-generated vision labels to a subset of COYO-700M for comparison with the JFT-300M.
We first removed the duplicated images by image_phash.
Then, we labeled 300M unique images into 21,841 classes by [EfficientNetV2-XL](https://arxiv.org.abs/2104.00298) trained with [ImageNet-21K](https://www.image-net.org/) dataset.

as described above readme, COYO-Labels-300M exists but I can't find how to get it ,or additional meta dataset to build this dataset. Will it be released in future? thank you in advance.

justHungryMan commented 2 years ago

Hi @Soonhwan-Kwon, thank you for your interest. We are currently preparing for the release of coyo-labeled-300M. We are also preparing ViT-L performance and training code using coyo-labeled-300M. You can meet in 1-2 weeks, so please stay tuned to our updates :) Thank you.

Soonhwan-Kwon commented 2 years ago

Thank you for the great news!

justHungryMan commented 2 years ago

Hi @Soonhwan-Kwon, we just updated COYO-Labeled-300M. Thank you for waiting. :)

Soonhwan-Kwon commented 2 years ago

I always admire opensource sprit and great research work of kakaobrain. Thank you for the wonderful image classification dataset set! I'll download it right now. Thank you!