jefferyZhan / Griffon

Official repo of Griffon series including v1(ECCV 2024), v2, and G
Apache License 2.0
110 stars 6 forks source link

Question on the preliminary findings / experiments with small-sized LLMs (3B and less) #7

Open nicolay-r opened 7 months ago

nicolay-r commented 7 months ago

Dear @jefferyZhan, hope you're doing well! I have a question which is related to your personal views as authors of your paper that shares the Griffon system findings. Besides the conducted ablation studies which was aimed at selection of best encoder from the whole chosen for experiments, have you attempted to build your system on top even smaller LLMs (3B params or less)?

jefferyZhan commented 7 months ago

Hi @nicolay-r, thank you for your interest in our work Griffon. Previous this year, there is apparent performance difference in general VQAs between open-source smaller LLMs and widely-used 7B/13B LLMs, and so do for more precise tasks in our system. We' re also interested in smaller LLMs to broaden the application. Hope to hear some insights from you and the community.

nicolay-r commented 7 months ago

Hi @jefferyZhan, thank you for update! I do understand your choice with such scaled LLMs of 7B/13B as you mentioned, as most of the technical reports of I quarter 2024 follows the same choice: OmniFusion, MM1-Apple, Ferret V2. I came more from the prespective of preliminary experiments with smaller LLMs rather the final systems. I believe this is the way of making architectural decisions for the higher scales like one you decided to go with. At the moment, the most accessible findings on experiments with 300M/1B models were found in MM1-Apple model technical report (take a look at ablation studies with data and modality encoders, since authors consider 300M/1B models for this).

Are your colleagues / community aware of the related findings or milestones that may help to initiate studies by any chance?

jefferyZhan commented 2 months ago

Sorry for the late reply. Currently, we mainly pay attention to the LLMs with 7B to nearly 30B parameters. Several models like Paligemma and LLaVA have proved the success of the smaller LLMs in vision-language tasks. Griffon will have a member of less than 2B later after the release of the new version Griffon.