salesforce / BLIP

PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
BSD 3-Clause "New" or "Revised" License
4.86k stars 648 forks source link

pre-training with LAION #36

Closed BlueCat7 closed 2 years ago

BlueCat7 commented 2 years ago

Hi, thanks for your awesome work. I have a question about pre-training with LAION 115M dataset. I found you add more LAION dataset when increasing epochs(https://github.com/salesforce/BLIP/blob/main/data/pretrain_dataset.py#L39). I guess you want to speed up training, am i right? And how many LAION files do you split for 20 epochs. I think if training with full 115M LAION dataset from the beginning, maybe it can get more good results but consumes more days. Look forward your reply, thanks.

LiJunnan1992 commented 2 years ago

Hi, thanks for your question.

As mentioned in our paper (footnote for section 4.1), we split LAION-115M into 5 splits and load one split per epoch. The purpose is indeed to speed up training. I agree that using more data per epoch could further improve result.

BlueCat7 commented 2 years ago

Got it. Thanks a lot.