salesforce / LAVIS

LAVIS - A One-stop Library for Language-Vision Intelligence
BSD 3-Clause "New" or "Revised" License
9.9k stars 971 forks source link

Question about the Qformer training #250

Open TobiasLee opened 1 year ago

TobiasLee commented 1 year ago

Hi, big thanks for your great work and open-sourced code & weights. I am fine-tuning/continuing training the checkpoints and hope you can kindly share some knowledge about the design & training details:

  1. what's the average loss of the final checkpoint for OPT decoder, i.e., the captioning loss. I'd like to use this to check whether my model is converging.

  2. did u have a plan to release some checkpoints of the first stage pre-training, as the Qformer trained after the first stage is model-agnostic?

zzhanghub commented 1 year ago

The ckpt of first stage pre training would be very helpful. I also look forward to its release.

LiJunnan1992 commented 1 year ago

We have released stage-1 checkpoints with both ViT-g and ViT-L.

You could run BLIP-2 stage1 pre-training from scratch now with bash run_scripts/blip2/train/pretrain_stage1.sh

Thank you.