jy0205 / LaVIT

LaVIT: Empower the Large Language Model to Understand and Generate Visual Content
Other
438 stars 22 forks source link

A question regarding the performance boost of Video-LaVIT over LaVIT #25

Closed chenjy2003 closed 1 month ago

chenjy2003 commented 2 months ago

Thanks for your great work.

I noticed that there is a huge performance boost of Video-LaVIT over LaVIT on benchmarks like VQAv2(from 66.0 to 80.2), GQA(from 46.8 to 63.6), VizWiz (from 38.5 to 54.0).

But there seems to be no explanation in the Video-LaVIT paper regarding this. (Sorry if I accidentally missed this part.)

Could you please show me how did you achieve this performance boost? Thanks in advance.

xukunxkxk commented 1 month ago

For LaVIT, we report zero-shot performance. For Video-LaVIT, we report SFT performance with the same instruction dataset and the base model as LLaVA-1.5.

chenjy2003 commented 1 month ago

Got it, thanks!