Open tensorpro opened 1 year ago
LaVIN is a vision-language instructed model that is affordable to train (it was trained in a few hours on 8 A100 GPUs) with good performance on ScienceQA.
I'd like to add LaVIN to HF transformers.
The paper Cheap and Quick: Efficient Vision-Language Instruction Tuning for Large Language Models is by Gen Luo, Yiyi Zhou, Tianhe Ren, Shengxin Chen, Xiaoshuai Sun, and Rongrong Ji
@luogen1996 has made the code and model weights available at github.com/luogen1996/LaVIN.
The weights for the following models are available at the following links:
Hi @amyeroberts, I don't think anyone is working on this anymore. If this adds any value to hf I'll start working on it.
Model description
LaVIN is a vision-language instructed model that is affordable to train (it was trained in a few hours on 8 A100 GPUs) with good performance on ScienceQA.
I'd like to add LaVIN to HF transformers.
Open source status
Provide useful links for the implementation
The paper Cheap and Quick: Efficient Vision-Language Instruction Tuning for Large Language Models is by Gen Luo, Yiyi Zhou, Tianhe Ren, Shengxin Chen, Xiaoshuai Sun, and Rongrong Ji
@luogen1996 has made the code and model weights available at github.com/luogen1996/LaVIN.
The weights for the following models are available at the following links:
ScienceQA
Multimodal ChatBot