zyf0619sjtu / DreamLIP

[ECCV 2024] Official PyTorch implementation of DreamLIP: Language-Image Pre-training with Long Captions
https://zyf0619sjtu.github.io/dream-lip/
105 stars 2 forks source link

Releasing long captions on the datasets #1

Closed vishaal27 closed 7 months ago

vishaal27 commented 7 months ago

Hi,

Thanks for the great work---the paper is a delight to read and the results look very compelling. I was wondering whether you were planning on releasing the generated long and short captions both for the CC-3M, CC-12M and YFCC-15M datasets? As I understand, the Merged-30M dataset with long-captions is simply a mixture of these three datasets with long and short captions? Furthermore, I noticed that you had both COYO-700M and LAION-400M in the pipeline, are there plans to release the long captions for that too?

zkcys001 commented 7 months ago

Hello~

Thanks for your attention. We will release the long caption of CC3M extracted by LLAVA1.5, InstructBLIP and ShareGPT4V of long caption in this week. For CC-12M and YFCC-15M datasets, we are organizing the generated long caption and plan to release them next month (may be 2024/4/10)~

Yep, the Merged-30M dataset with long-captions is simply a mixture of these three datasets with long and short captions.

For COYO-700M and LAION-400M, due to the limitation of GPU, we only extract long caption on laion20m and coyo4m via ShareGPT4V~and would like to release them next month too.

Thanks~

Kecheng

zkcys001 commented 7 months ago

We released the long caption of CC3M extracted by LLAVA1.5, InstructBLIP and ShareGPT4V of long caption at https://drive.google.com/file/d/19jCNWvy7kA70u-ufQtEJvbKVMG2b8MnP/view?usp=drive_link (csv version)

If you have any questions, please feel free to contact me~

Best Kecheng

vishaal27 commented 7 months ago

Hey it seems the drive link is private, could you please make it public? Thanks for releasing!

zkcys001 commented 7 months ago

hey. We have made this link public.

vishaal27 commented 7 months ago

Awesome, thanks!

Vibashan commented 7 months ago

Hi @zkcys001 ,

Thank you for the great work. The Google Drive link you shared is still private.