salesforce / LAVIS

LAVIS - A One-stop Library for Language-Vision Intelligence
BSD 3-Clause "New" or "Revised" License
9.81k stars 963 forks source link

About instructblip's train detail #414

Open zdxff opened 1 year ago

zdxff commented 1 year ago

In the paper, you say "Since the original BLIP-2 models do not include checkpoints for Vicuna, we perform pre-training with Vicuna using the same procedure as BLIP-2". Is this means instructblip training from the second stage model? But the second stage model dropped the qformer's text decoder out. Does the new feedforward layer is randomly initialed or initialed from the first stage model?

qwqwq1445 commented 1 year ago

You can refer to #344.