Closed dyashuni closed 1 year ago
Hi @dyashuni, thanks for your interest. Could you take a look at our LAVIS library? https://github.com/salesforce/LAVIS It supports BLIP pre-training among other functions.
@LiJunnan1992 thank you, I will take a look at LAVIS
Hi @LiJunnan1992 ! I finetuned 3 pretrained models on a COCO caption task using train_caption.py. I used 32 GPU for pretraining.
And got the following metrics:
Model BLIP w/ ViT-B 14M
performs almost the same as BLIP w/ ViT-B and CapFilt-L 129M
. That contradicts published results...
How is it possible?
Could you reproduce BLIP's fine-tuning result if you use the same setting? " I used 32 GPU for pretraining." -> I assume that you mean "finetuning"?
I used your config caption_coco.yaml
for finetuning. So I used your params.
How many GPUs did you use for finetuning?
" I used 32 GPU for pretraining." -> I assume that you mean "finetuning"? Yes, thank you
I used 8 GPUs. With 32 GPUs, you should set batch_size=8 so that the total batch size remains 256.
Thank you! I will try it.
Hi @LiJunnan1992, thank you for great work!
I'm trying to reproduce the pretraining on the CC + COCO + SBU + VG dataset. I get higher losses than yours reported here https://github.com/salesforce/BLIP/issues/19#issuecomment-1046398252 I use the following dataset:
I didn't balance these datasets. I took the pretrain yaml config from here https://github.com/salesforce/BLIP/blob/main/configs/pretrain.yaml and added new datasets to the training.
Could you please share your yaml config for pretraining on the CC + COCO + SBU + VG dataset ?