PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
BSD 3-Clause "New" or "Revised" License
4.85k
stars
648
forks
source link
Need clearly Understand of each checkpoint #190
Open
p1k0pan opened 1 year ago
Hi, thank you for your great work. I am little bit confused about the checkpoint that post on the repository. I saw the paper at "Pre-training Details" section, pretrianed dataset is 14M including COCO, Flickr.... which match the checkpoint with the link https://storage.googleapis.com/sfr-vision-language-research/BLIP/models/model_base_14M.pth right?
Also did model_base_14M and model_base (https://storage.googleapis.com/sfr-vision-language-research/BLIP/models/model_base.pth) all use CapFilt?
Thank you for your help