๐ Star us if you think it's helpful. Your support means a lot! โญ๏ธ
TL; DR: LLaVA-MoD is an efficient framework for training small-scale Multimodal Language Models by distilling knowledge from larger models.
anaconda
, and install torch
, We recommend installing torch==2.1.2
and cuda==11.8
.# CUDA 11.8
pip install torch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2 --index-url https://download.pytorch.org/whl/cu118
requirements
pip install -r requirements.txt
We follow LLaVA to construct the data as following format:
{
"id": "000000052846",
"image": "COCO2017/train/000000052846.jpg",
"conversations": [
{
"from": "human",
"value": "Where is the cat positioned in the image?\n<image>"
},
{
"from": "gpt",
"value": "The cat is positioned on top of the back of the couch in the living room."
},
{
"from": "human",
"value": "What is the cat doing in the image?"
},
{
"from": "gpt",
"value": "The cat is coming out from some curtains onto the couch and is sitting or standing on top of it."
}
]
}
We follow RLAIF-V to construct the data as following format:
{
"id": "OmniLMM-12B_OmniLMM-12B_0",
"image": "RLAIF-V-Dataset_000_OmniLMM-12B_OmniLMM-12B_0.jpg",
"chosen": [
{
"from": "human",
"value": "<image>\nWho is more likely to use these tools a leather crafter or a paper crafter?"
},
{
"from": "gpt",
"value": "A leather crafter is more likely to use these tools. The image shows various crafting tools, including scissors and a hole punch, which are commonly used in leatherworking projects. Leather is a material that requires cutting, shaping, and precise hole-punching techniques to create desired designs or patterns. In contrast, paper crafters typically use different types of tools, such as adhesives, decorative papers, or specialized cutting machines like the Silhouette Cameo, for their projects."
}
],
"rejected": [
{
"from": "human",
"value": "<image>\nWho is more likely to use these tools a leather crafter or a paper crafter?"
},
{
"from": "gpt",
"value": "A leather crafter is more likely to use these tools as they consist of a hole punch, scissors, and a knife. These items are typically used in crafting projects involving fabric or leather materials for various designs and patterns. Paper crafters may also benefit from some of these tools, but their primary focus would be on paper-related projects, which might require different types of tools such as paper cutters or scrapbooking supplies."
}
]
}
The full details for training and evaluation can be found in the TRAIN_EVAL.md.
For instructions on inference, please refer to the INFERENCE.md.
If you find our project useful for your research and applications, please star it and cite the paper using this BibTeX:
@article{shu2024llavamod,
title={LLaVA-MoD: Making LLaVA Tiny via MoE Knowledge Distillation},
author={Shu, Fangxun and Liao, Yue and Zhuo, Le and Xu, Chenning and Zhang, Lei and Zhang, Guanghao and Shi, Haonan and Chen, Long and Zhong, Tao and He, Wanggui and Fu, Siming and others},
journal={arXiv preprint arXiv:2408.15881},
year={2024}
}
Our project is built upon MoE-LLaVA and LLaVA. We are deeply grateful for the excellent codebase they provide. Additionally, we express our appreciation to MobileVLM and RLAIF-V for their meticulously processed datasets. Their contributions have been of immeasurable value in shaping our work.
Our project is released under the Apache 2.0 license.