huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
129.66k stars 25.75k forks source link

Add LaVIN model #23846

Open tensorpro opened 1 year ago

tensorpro commented 1 year ago

Model description

LaVIN is a vision-language instructed model that is affordable to train (it was trained in a few hours on 8 A100 GPUs) with good performance on ScienceQA.

I'd like to add LaVIN to HF transformers.

Open source status

Provide useful links for the implementation

The paper Cheap and Quick: Efficient Vision-Language Instruction Tuning for Large Language Models is by Gen Luo, Yiyi Zhou, Tianhe Ren, Shengxin Chen, Xiaoshuai Sun, and Rongrong Ji

@luogen1996 has made the code and model weights available at github.com/luogen1996/LaVIN.

The weights for the following models are available at the following links:

ScienceQA

Model Weights Time Memory #Params Acc Weights
LaVIN-7B LLaMA 1.4 hours 33.9G 3.8M 89.37 google drive
LaVIN-7B Vicuna 1.4 hours 33.9G 3.8M 89.41 google drive
LaVIN-13B LLaMA 2 hours 55.9G 5.4M 90.54 google drive

Multimodal ChatBot

Model Weights Time Memory #Params Acc Weights
LaVIN-13B LLaMA 75 hours 55.9G 5.4M - google drive
shauray8 commented 1 year ago

Hi @amyeroberts, I don't think anyone is working on this anymore. If this adds any value to hf I'll start working on it.