* Equal contributions
This repository enhances the capabilities of the LLaVA 1.5 model, incorporating latest LLMs released this weak🔥, Phi-3 Mini Instruct 3.8B, and LLaMA-3 Instruct 8B.
The following table provides an overview of the available models in our zoo. For each model, you can find links to its Hugging Face page.
Model Name | Hugging Face Link | Summary |
---|---|---|
LLaVA-Phi-3-mini-4k-instruct-pretrain | Hugging Face | Pretrained on LCS-558K. |
LLaVA-Phi-3-mini-4k-instruct-lora | Hugging Face | LoRA weights fine-tuned on LLaVA-Instruct-665K. |
LLaVA-Phi-3-mini-4k-instruct | Hugging Face | Merged LoRA weights in HuggingFace format. |
LLaVA-Phi-3-mini-4k-instruct-FT | Hugging Face | Fully fine-tuned model weights in HuggingFace format. |
Model Name | Hugging Face Link | Summary |
---|---|---|
LLaVA-Meta-Llama-3-8B-Instruct-pretrain | Hugging Face | Pretrained on LCS-558K. |
LLaVA-Meta-Llama-3-8B-Instruct-lora | Hugging Face | LoRA weights fine-tuned on LLaVA-Instruct-665K. |
LLaVA-Meta-Llama-3-8B-Instruct | Hugging Face | Merged weights in HuggingFace format. |
LLaVA-Meta-Llama-3-8B-Instruct-FT | Hugging Face | Fully fine-tuned model weights in HuggingFace format. |
LLaVA-Meta-Llama-3-8B-Instruct-FT-S2 | Hugging Face | Fully fine-tuned S2 model weights in HuggingFace format. |
git clone https://github.com/mbzuai-oryx/LLaVA-pp.git
cd LLaVA-pp
git submodule update --init --recursive
Packages you need to update from LLAVA:
pip install git+https://github.com/huggingface/transformers@a98c41798cf6ed99e1ff17e3792d6e06a2ff2ff3
To integrate Phi-3-V with LLaVA, follow these steps to update the codebase:
# Copy necessary files
cp Phi-3-V/train.py LLaVA/llava/train/train.py
cp Phi-3-V/llava_phi3.py LLaVA/llava/model/language_model/llava_phi3.py
cp Phi-3-V/builder.py LLaVA/llava/model/builder.py
cp Phi-3-V/model__init__.py LLaVA/llava/model/__init__.py
cp Phi-3-V/main__init__.py LLaVA/llava/__init__.py
cp Phi-3-V/conversation.py LLaVA/llava/conversation.py
# Training commands
cp scripts/Phi3-V_pretrain.sh LLaVA/Vi-phi3_pretrain.sh
cp scripts/Phi3-V_finetune_lora.sh LLaVA/Vi-phi3_finetune_lora.sh
cd LLaVA
bash Phi3-V_pretrain.sh
cd LLaVA
bash Phi3-V_finetune_lora.sh
To integrate LLaMA-3-V with LLaVA, follow these steps to update the codebase:
# Copy necessary files
cp LLaMA-3-V/train.py LLaVA/llava/train/train.py
cp LLaMA-3-V/conversation.py LLaVA/llava/conversation.py
cp LLaMA-3-V/builder.py LLaVA/llava/model/builder.py
cp LLaMA-3-V/llava_llama.py LLaVA/llava/model/language_model/llava_llama.py
# Training commands
cp scripts/LLaMA3-V_pretrain.sh LLaVA/LLaMA3-V_pretrain.sh
cp scripts/LLaMA3-V_finetune_lora.sh LLaVA/LLaMA3-V_finetune_lora.sh
cd LLaVA
bash LLaMA3-V_pretrain.sh
cd LLaVA
bash LLaMA3-V_finetune_lora.sh
We are thankful to LLaVA, lmms-eval and S2-Wrapper for releasing their models and code as open-source contributions.
In case if you face any issues or have any questions, please feel free to create an issue or reach out at hanoona.bangalath@mbzuai.ac.ae & muhammad.maaz@mbzuai.ac.ae.
@misc{hanoona2024LLaVA++,
title={LLaVA++: Extending Visual Capabilities with LLaMA-3 and Phi-3},
author={Rasheed, Hanoona and Maaz, Muhammad and Khan, Salman and Khan, Fahad S.},
url={https://github.com/mbzuai-oryx/LLaVA-pp},
year={2024}
}