[ Paper ] [ Website ] [ Dataset (OpenDataLab)] [ Dataset (Hugging face) ] [Demo]
2023.12.22
🎉🎉🎉 We release a [technical report]() for more details.
A 100M debiased LAION subset (OpenDataLab and Hugging Face. ) and pre-trained models are publicly available.
We trained the Kmeans model from the LAION-400M dataset CLIP ViT-B-32 features using fassi. We first used PCA to reduce the feature dimension. The training and inference code in kmeans.py.
PCA weigths | Kmeans centrios |
---|---|
Download | Download |
The generation pipeline of synthetic images (sys_benchmark.py and Arial.ttf) and the N-gram Vocabulary we built from the dataset.
LAION-2B Caption 1-gram | LAION-2B Caption 2-gram | LAION-2B Co-Emb Text 1-gram |
---|---|---|
Download | Download | Download |
our training code is based on OpenCLIP
Note that the OCR model is not perfect. The images in our filtered subset still contain some text content. Therefore, we also benchmark our trained model on the synthetic images benchmark.
100M subset | ViT-B Models |
---|---|
Download | Download |
1-gram Synthetic Benchmark | Ours (100M) |
CLIP (WIT-400M) |
OpenCLIP (LAION-2B) |
DC medium 128M (DC) |
DC large 1.28B (DC) |
---|---|---|---|---|---|
Sync. Score (mean) $\downarrow$ | 0.163 | 0.317 | 0.368 | 0.268 | 0.338 |
Sync. Score (std) | 0.0659 | 0.0305 | 0.0427 | 0.0247 | 0.0341 |
DataComp benchmark | Ours (100M) |
CLIP (WIT-400M) |
OpenCLIP (LAION-2B) |
DC medium 128M (DC) |
DC large 1.28B (DC) |
---|---|---|---|---|---|
ImageNet | 0.526 | 0.633 | 0.666 | 0.176 | 0.459 |
ImageNet dist. shifts | 0.404 | 0.485 | 0.522 | 0.152 | 0.378 |
VTAB | 0.481 | 0.526 | 0.565 | 0.259 | 0.426 |
Retrieval | 0.421 | 0.501 | 0.560 | 0.219 | 0.419 |
Average | 0.443 | 0.525 | 0.565 | 0.258 | 0.437 |
Thanks for these good works:
@article{lin2023parrot,
title={Parrot Captions Teach CLIP to Spot Text},
author={Yiqi Lin and Conghui He and Alex Jinpeng Wang and Bin Wang and Weijia Li and Mike Zheng Shou},
journal={arXiv preprint arXiv:2312.14232},
year={2023}
}
@article{he2024opendatalab,
title={Opendatalab: Empowering general artificial intelligence with open datasets},
author={He, Conghui and Li, Wei and Jin, Zhenjiang and Xu, Chao and Wang, Bin and Lin, Dahua},
journal={arXiv preprint arXiv:2407.13773},
year={2024}
}