Add LaVIN model - Githubissues

Model description

LaVIN is a vision-language instructed model that is affordable to train (it was trained in a few hours on 8 A100 GPUs) with good performance on ScienceQA.

I'd like to add LaVIN to HF transformers.

Open source status

[X] The model implementation is available
[X] The model weights are available

Provide useful links for the implementation

The paper Cheap and Quick: Efficient Vision-Language Instruction Tuning for Large Language Models is by Gen Luo, Yiyi Zhou, Tianhe Ren, Shengxin Chen, Xiaoshuai Sun, and Rongrong Ji

@luogen1996 has made the code and model weights available at github.com/luogen1996/LaVIN.

The weights for the following models are available at the following links:

ScienceQA

Model	Weights	Time	Memory	#Params	Acc	Weights
LaVIN-7B	LLaMA	1.4 hours	33.9G	3.8M	89.37	google drive
LaVIN-7B	Vicuna	1.4 hours	33.9G	3.8M	89.41	google drive
LaVIN-13B	LLaMA	2 hours	55.9G	5.4M	90.54	google drive

Multimodal ChatBot

Model	Weights	Time	Memory	#Params	Acc	Weights
LaVIN-13B	LLaMA	75 hours	55.9G	5.4M	-	google drive

huggingface / transformers

Add LaVIN model #23846

Model description

Open source status

Provide useful links for the implementation

ScienceQA

Multimodal ChatBot