Open SkalskiP opened 2 months ago
Hi @SkalskiP,
Thank you for your interest in our work. The run_llava.py scripts from official LLaVA repository along with the edited files we provided in our repository can be used for inference.
However, we are planning to release some straight forward inference scripts soon. Stay tuned!
Results from the for the below image,
Question: Describe the image in detail.
Model Response: The image features a group of five women standing on a street, each holding a colorful umbrella. They are all dressed in different outfits, showcasing a variety of styles. The women are positioned in a line, with one woman on the left side, another in the middle, and the remaining three on the right side of the street. In the background, there are a few cars parked on the left side of the street, and a stop sign is visible near the left edge of the image. The scene captures a fun and lively atmosphere as the women enjoy their time together under their umbrellas.
Hi @mmaaz60, thanks for your excellent work, can your provide a inference script base on checkpoints or weights trained by LLaMA3-V_finetune_lora.sh?
Hi @At1a8,
We appreciate your interest in our work. Please note that we also provide the merged weights obtained by merging the LoRA weights with the base LLM. For example, for LLaMA-3 the merged LoRA weights are available at LLaVA-Meta-Llama-3-8B-Instruct.
Further, the weights obtained using full fine-tuning are available at LLaVA-Meta-Llama-3-8B-Instruct-FT.
We notice that, for LLaMA-3-V, the fully fine-tuned model works better than the LoRA fine-tuned model.
The same inference pipeline as in Google Colab can be used for LLaMA-3-V models as well. However, here you have to copy LLaMA-3-V files instead of Phi-3-V and download the LLaMA-3-V model.
We hope it will help. Please let us know if you have any questions. Thank You
Hi @At1a8,
We appreciate your interest in our work. Please note that we also provide the merged weights obtained by merging the LoRA weights with the base LLM. For example, for LLaMA-3 the merged LoRA weights are available at LLaVA-Meta-Llama-3-8B-Instruct.
Further, the weights obtained using full fine-tuning are available at LLaVA-Meta-Llama-3-8B-Instruct-FT.
We notice that, for LLaMA-3-V, the fully fine-tuned model works better than the LoRA fine-tuned model.
The same inference pipeline as in Google Colab can be used for LLaMA-3-V models as well. However, here you have to copy LLaMA-3-V files instead of Phi-3-V and download the LLaMA-3-V model.
We hope it will help. Please let us know if you have any questions. Thank You
Thanks for your reply, we trained our models based on customed dataset, and we want to merge the weights from llama3 and lora weight which trained by your code.
How can we do that, could you please give a code example?
Thanks so much.
Hi @At1a8,
Thanks for the clarification, you can use the following script to merge LoRA weights after training.
import argparse
from llava.model.builder import load_pretrained_model
from llava.mm_utils import get_model_name_from_path
def merge_lora(args):
model_name = get_model_name_from_path(args.model_path)
tokenizer, model, image_processor, context_len = load_pretrained_model(args.model_path, args.model_base, model_name, device_map='cpu')
model.save_pretrained(args.save_model_path)
tokenizer.save_pretrained(args.save_model_path)
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument("--model-path", type=str, required=True)
parser.add_argument("--model-base", type=str, required=True)
parser.add_argument("--save-model-path", type=str, required=True)
args = parser.parse_args()
merge_lora(args)
Here --model-path
is the LoRA weights path, --model-base
is the base model path (in your case it would be Meta LLaMA-3-8B) and --save-model-path
is the path to save the merged checkpoints.
I hope it will help. Please let me know if you face any issues. Good Luck!
Hi @At1a8,
We have just added the merge_lora_weights.py script that will be helpful to merge the LoRA weights. Please let us know if you have any questions. Good Luck!
@mmaaz60 thanks a lot! I'll make sure to play with it ;)
Hi @At1a8,
We have just added the merge_lora_weights.py script that will be helpful to merge the LoRA weights. Please let us know if you have any questions. Good Luck!
We have trained this scripts to get checkpoints
#!/bin/bash
deepspeed --include localhost:4,5,6,7 llava/train/train_mem4Drive.py \
--lora_enable True --lora_r 128 --lora_alpha 256 --mm_projector_lr 2e-5 \
--deepspeed ./scripts/zero3.json \
--model_name_or_path Undi95/Meta-Llama-3-8B-Instruct-hf \
--version llama3 \
--data_path ./../v1_full_llama.json \
--image_folder ./../vlm_dataset \
--vision_tower openai/clip-vit-large-patch14-336 \
--pretrain_mm_mlp_adapter ./checkpoints/LLaVA-Meta-Llama-3-8B-Instruct-pretrain/mm_projector.bin \
--mm_projector_type mlp2x_gelu \
--mm_vision_select_layer -2 \
--mm_use_im_start_end False \
--mm_use_im_patch_token False \
--image_aspect_ratio pad \
--group_by_modality_length True \
--bf16 True \
--output_dir ./checkpoints/llava-v1.5-llama3-8b-task-lora1 \
--num_train_epochs 3 \
--per_device_train_batch_size 2 \
--per_device_eval_batch_size 2 \
--gradient_accumulation_steps 1 \
--evaluation_strategy "no" \
--save_strategy "steps" \
--save_steps 50000 \
--save_total_limit 1 \
--learning_rate 2e-5 \
--weight_decay 0. \
--warmup_ratio 0.03 \
--lr_scheduler_type "cosine" \
--logging_steps 1 \
--tf32 True \
--model_max_length 8192 \
--gradient_checkpointing True \
--dataloader_num_workers 8 \
--lazy_preprocess True \
--report_to none #wandb
and used merged script mentioned above and have following logs:
(llava) fangyuan@xcdloss220176:/group/ossdphi_algo_scratch_02/fangyuan/LLaVA/LLaVA$ python3 ./scripts/merge_lora_weights.py --model-base Undi95/Meta-Llama-3-8B-Instruct-hf --model-path ./checkpoints/llava-v1.5-llama3-8b-task-lora1 --save-model-path ./../runs/llava_llama3_test1
[2024-04-30 10:44:24,172] [INFO] [real_accelerator.py:161:get_accelerator] Setting ds_accelerator to cuda (auto detect)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Loading LLaVA from base model...
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:51<00:00, 12.95s/it]
Some weights of LlavaLlamaForCausalLM were not initialized from the model checkpoint at Undi95/Meta-Llama-3-8B-Instruct-hf and are newly initialized: ['model.mm_projector.0.bias', 'model.mm_projector.0.weight', 'model.mm_projector.2.bias', 'model.mm_projector.2.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Adding pad token as '<pad>'
Loading additional LLaVA weights...
Loading LoRA weights...
Merging LoRA weights...
Model is loaded...
/group/ossdphi_algo_scratch_02/fangyuan/anaconda3/envs/llava/lib/python3.10/site-packages/torch/_utils.py:831: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
return self.fget.__get__(instance, owner)()
We still cannot access to meta-llama/Meta-Llama-3-8B-Instruct and we use Undi95/Meta-Llama-3-8B-Instruct-hf instead, and we encounted this warning:
Some weights of LlavaLlamaForCausalLM were not initialized from the model checkpoint at Undi95/Meta-Llama-3-8B-Instruct-hf and are newly initialized: ['model.mm_projector.0.bias', 'model.mm_projector.0.weight', 'model.mm_projector.2.bias', 'model.mm_projector.2.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Why does this warning appear and do we have a someway to solve it? Looking forward to you suggestions.
Hi @At1a8
This warning is normal. During merging, we first try to load base LLM checkpoints to our Visual-LLM class that do not have projector weights. However later we load LoRA and additional weights that contain the projector weights as well.
In summary, this warning is normal and you can ignore it and proceed. Good Luck!
Hi 👋🏻 Do you have any inference examples that I could use?