khanrc / honeybee

Official implementation of project Honeybee (CVPR 2024)
Other
428 stars 19 forks source link

Why the performance increase so littel from 7B to 13B? #21

Closed MonolithFoundation closed 5 months ago

MonolithFoundation commented 6 months ago

Why the performance increase so littel from 7B to 13B?

khanrc commented 6 months ago

It's about +3pp on MMB, +100 on MME, +4pp on SEED, and +8pp on LLaVA-w. I think this is not that little. However, since both the 7B and 13B models of Honeybee use the same vision encoder (CLIP ViT-L/14), the improvement in VL performance may appear relatively small compared to the increase in LLM size. It is worth noting that this level of performance improvement is not small, when compared to other MLLMs (e.g., LLaVA-1.5).