salesforce / BLIP

PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
BSD 3-Clause "New" or "Revised" License
4.85k stars 648 forks source link

Need clearly Understand of each checkpoint #190

Open p1k0pan opened 1 year ago

p1k0pan commented 1 year ago

Hi, thank you for your great work. I am little bit confused about the checkpoint that post on the repository. I saw the paper at "Pre-training Details" section, pretrianed dataset is 14M including COCO, Flickr.... which match the checkpoint with the link https://storage.googleapis.com/sfr-vision-language-research/BLIP/models/model_base_14M.pth right?

Also did model_base_14M and model_base (https://storage.googleapis.com/sfr-vision-language-research/BLIP/models/model_base.pth) all use CapFilt?

Thank you for your help