inference examples - Githubissues

SkalskiP commented 2 months ago

Hi 👋🏻 Do you have any inference examples that I could use?

mmaaz60 commented 2 months ago

Hi @SkalskiP,

Thank you for your interest in our work. The run_llava.py scripts from official LLaVA repository along with the edited files we provided in our repository can be used for inference.

However, we are planning to release some straight forward inference scripts soon. Stay tuned!

mmaaz60 commented 2 months ago

Hi @SkalskiP,

Thank you for your patience, Google Colab demo is now available, check it out,

mmaaz60 commented 2 months ago

Results from the for the below image,

MicrosoftTeams-image (7)

Question: Describe the image in detail.

Model Response: The image features a group of five women standing on a street, each holding a colorful umbrella. They are all dressed in different outfits, showcasing a variety of styles. The women are positioned in a line, with one woman on the left side, another in the middle, and the remaining three on the right side of the street. In the background, there are a few cars parked on the left side of the street, and a stop sign is visible near the left edge of the image. The scene captures a fun and lively atmosphere as the women enjoy their time together under their umbrellas.

mmaaz60 commented 2 months ago

Hi @SkalskiP,

We have just released the online demo of both Phi-3-V and LLaMA-3-V. Check it out at

At1a8 commented 2 months ago

Hi @mmaaz60, thanks for your excellent work, can your provide a inference script base on checkpoints or weights trained by LLaMA3-V_finetune_lora.sh?

mmaaz60 commented 2 months ago

Hi @At1a8,

We appreciate your interest in our work. Please note that we also provide the merged weights obtained by merging the LoRA weights with the base LLM. For example, for LLaMA-3 the merged LoRA weights are available at LLaVA-Meta-Llama-3-8B-Instruct.

Further, the weights obtained using full fine-tuning are available at LLaVA-Meta-Llama-3-8B-Instruct-FT.

We notice that, for LLaMA-3-V, the fully fine-tuned model works better than the LoRA fine-tuned model.

The same inference pipeline as in Google Colab can be used for LLaMA-3-V models as well. However, here you have to copy LLaMA-3-V files instead of Phi-3-V and download the LLaMA-3-V model.

We hope it will help. Please let us know if you have any questions. Thank You

At1a8 commented 2 months ago

Hi @At1a8,

We appreciate your interest in our work. Please note that we also provide the merged weights obtained by merging the LoRA weights with the base LLM. For example, for LLaMA-3 the merged LoRA weights are available at LLaVA-Meta-Llama-3-8B-Instruct.

Further, the weights obtained using full fine-tuning are available at LLaVA-Meta-Llama-3-8B-Instruct-FT.

We notice that, for LLaMA-3-V, the fully fine-tuned model works better than the LoRA fine-tuned model.

The same inference pipeline as in Google Colab can be used for LLaMA-3-V models as well. However, here you have to copy LLaMA-3-V files instead of Phi-3-V and download the LLaMA-3-V model.

We hope it will help. Please let us know if you have any questions. Thank You

Thanks for your reply, we trained our models based on customed dataset, and we want to merge the weights from llama3 and lora weight which trained by your code.

How can we do that, could you please give a code example?

Thanks so much.

mmaaz60 commented 2 months ago

Hi @At1a8,

Thanks for the clarification, you can use the following script to merge LoRA weights after training.

import argparse
from llava.model.builder import load_pretrained_model
from llava.mm_utils import get_model_name_from_path

def merge_lora(args):
    model_name = get_model_name_from_path(args.model_path)
    tokenizer, model, image_processor, context_len = load_pretrained_model(args.model_path, args.model_base, model_name, device_map='cpu')

    model.save_pretrained(args.save_model_path)
    tokenizer.save_pretrained(args.save_model_path)

if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument("--model-path", type=str, required=True)
    parser.add_argument("--model-base", type=str, required=True)
    parser.add_argument("--save-model-path", type=str, required=True)

    args = parser.parse_args()

    merge_lora(args)

Here --model-path is the LoRA weights path, --model-base is the base model path (in your case it would be Meta LLaMA-3-8B) and --save-model-path is the path to save the merged checkpoints.

I hope it will help. Please let me know if you face any issues. Good Luck!

mmaaz60 commented 2 months ago

Hi @At1a8,

We have just added the merge_lora_weights.py script that will be helpful to merge the LoRA weights. Please let us know if you have any questions. Good Luck!

SkalskiP commented 2 months ago

@mmaaz60 thanks a lot! I'll make sure to play with it ;)

At1a8 commented 2 months ago

Hi @At1a8,

We have just added the merge_lora_weights.py script that will be helpful to merge the LoRA weights. Please let us know if you have any questions. Good Luck!

We have trained this scripts to get checkpoints

#!/bin/bash

deepspeed --include localhost:4,5,6,7 llava/train/train_mem4Drive.py \
    --lora_enable True --lora_r 128 --lora_alpha 256 --mm_projector_lr 2e-5 \
    --deepspeed ./scripts/zero3.json \
    --model_name_or_path Undi95/Meta-Llama-3-8B-Instruct-hf \
    --version llama3 \
    --data_path ./../v1_full_llama.json \
    --image_folder ./../vlm_dataset \
    --vision_tower openai/clip-vit-large-patch14-336 \
    --pretrain_mm_mlp_adapter ./checkpoints/LLaVA-Meta-Llama-3-8B-Instruct-pretrain/mm_projector.bin \
    --mm_projector_type mlp2x_gelu \
    --mm_vision_select_layer -2 \
    --mm_use_im_start_end False \
    --mm_use_im_patch_token False \
    --image_aspect_ratio pad \
    --group_by_modality_length True \
    --bf16 True \
    --output_dir ./checkpoints/llava-v1.5-llama3-8b-task-lora1 \
    --num_train_epochs 3 \
    --per_device_train_batch_size 2 \
    --per_device_eval_batch_size 2 \
    --gradient_accumulation_steps 1 \
    --evaluation_strategy "no" \
    --save_strategy "steps" \
    --save_steps 50000 \
    --save_total_limit 1 \
    --learning_rate 2e-5 \
    --weight_decay 0. \
    --warmup_ratio 0.03 \
    --lr_scheduler_type "cosine" \
    --logging_steps 1 \
    --tf32 True \
    --model_max_length 8192 \
    --gradient_checkpointing True \
    --dataloader_num_workers 8 \
    --lazy_preprocess True \
    --report_to none #wandb

and used merged script mentioned above and have following logs:

(llava) fangyuan@xcdloss220176:/group/ossdphi_algo_scratch_02/fangyuan/LLaVA/LLaVA$ python3 ./scripts/merge_lora_weights.py --model-base Undi95/Meta-Llama-3-8B-Instruct-hf --model-path ./checkpoints/llava-v1.5-llama3-8b-task-lora1 --save-model-path ./../runs/llava_llama3_test1
[2024-04-30 10:44:24,172] [INFO] [real_accelerator.py:161:get_accelerator] Setting ds_accelerator to cuda (auto detect)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Loading LLaVA from base model...
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:51<00:00, 12.95s/it]
Some weights of LlavaLlamaForCausalLM were not initialized from the model checkpoint at Undi95/Meta-Llama-3-8B-Instruct-hf and are newly initialized: ['model.mm_projector.0.bias', 'model.mm_projector.0.weight', 'model.mm_projector.2.bias', 'model.mm_projector.2.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Adding pad token as '<pad>'
Loading additional LLaVA weights...
Loading LoRA weights...
Merging LoRA weights...
Model is loaded...
/group/ossdphi_algo_scratch_02/fangyuan/anaconda3/envs/llava/lib/python3.10/site-packages/torch/_utils.py:831: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  return self.fget.__get__(instance, owner)()

We still cannot access to meta-llama/Meta-Llama-3-8B-Instruct and we use Undi95/Meta-Llama-3-8B-Instruct-hf instead, and we encounted this warning:

Some weights of LlavaLlamaForCausalLM were not initialized from the model checkpoint at Undi95/Meta-Llama-3-8B-Instruct-hf and are newly initialized: ['model.mm_projector.0.bias', 'model.mm_projector.0.weight', 'model.mm_projector.2.bias', 'model.mm_projector.2.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.

Why does this warning appear and do we have a someway to solve it? Looking forward to you suggestions.

mmaaz60 commented 2 months ago

Hi @At1a8

This warning is normal. During merging, we first try to load base LLM checkpoints to our Visual-LLM class that do not have projector weights. However later we load LoRA and additional weights that contain the projector weights as well.

In summary, this warning is normal and you can ignore it and proceed. Good Luck!

mbzuai-oryx / LLaVA-pp

inference examples #1