salesforce / LAVIS

LAVIS - A One-stop Library for Language-Vision Intelligence
BSD 3-Clause "New" or "Revised" License
9.2k stars 909 forks source link

[BLIP2] How to perform stage 1 Vision-Language Representation bootstraping #237

Open klima7 opened 1 year ago

klima7 commented 1 year ago

I think that stage 1 learning, that means visual-language representation learning with those three objectives mentioned in the article is not yet implemented. Am I right?

Not implemented load_pretrained: False in pretrain_stage1.yaml, is only part of the problem. LLM is connected so it's rather Generative Learning from stage 2. Is there currently any way to properly perform stage 1 Vision-Language Representation bootstraping?

LiJunnan1992 commented 1 year ago

You could run BLIP-2 stage 1 pre-training now with bash run_scripts/blip2/train/pretrain_stage1.sh. Thank you.