haotian-liu / LLaVA

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
https://llava.hliu.cc
Apache License 2.0
20.17k stars 2.23k forks source link

How to fine-tune the LLaVA-7b model ? #138

Open yunh-w opened 1 year ago

yunh-w commented 1 year ago

Question

Hi, thanks on your great work!

I use the following command to fine-tune the LLaVA-7b model.

$PYTHON --nnodes=1 --nproc_per_node=8 --master_port=25001 \ llava/train/train_mem.py \ --model_name_or_path LLaMA-7b-convert \ --data_path $data_path \ --image_folder $image_folder \ --vision_tower $vision_tower \ --pretrain_mm_mlp_adapter LLaVA-7b-pretrain-projector-v0-CC3M-595K-original_caption.bin \ --mm_vision_select_layer -2 \ --mm_use_im_start_end True \ --bf16 True \ --output_dir ./checkpoints/llava-7B_new \ --num_train_epochs 5 \ --per_device_train_batch_size 4 \ --per_device_eval_batch_size 4 \ --gradient_accumulation_steps 1 \ --evaluation_strategy "no" \ --save_strategy "steps" \ --save_steps 5 \ --save_total_limit 3 \ --learning_rate 2e-5 \ --weight_decay 0. \ --warmup_ratio 0.03 \ --lr_scheduler_type "cosine" \ --logging_steps 1 \ --tf32 True \ --fsdp "full_shard auto_wrap" \ --fsdp_transformer_layer_cls_to_wrap 'LlamaDecoderLayer' \ --model_max_length 2048 \ --gradient_checkpointing True \ --lazy_preprocess True \ --report_to wandb

But three weights are obtained, when your LLaVA-7b weights number is two. And I get error when I load these fine-tuned weights. How to fine-tune the LLaVA-7b ? Thanks so much!

image image

OSError: Unable to load weights from pytorch checkpoint file for 'LLaVA-main/checkpoints/llava-7B_new/checkpoint-5/pytorch_model-00003-of-00003.bin' at 'LLaVA-main/checkpoints/llava-7B_new/checkpoint-5/pytorch_model-00003-of-00003.bin'. If you tried to load a PyTorch model from a TF 2.0 checkpoint, please set from_tf=True.

I found that the third model was not saved completely. When saving, it was OOM, but the training did not stop.. Thanks.

Chen-Song commented 1 year ago

Me too! After I finetune 7B, the model I got is three bin files, but what you release is two bin files. The files I get from finetune are all very large, and the total_size in "pytorch_model.bin.index.json" is 26970595328, while what you release is only 13485301760.

image
haotian-liu commented 1 year ago

Hi @Chen-Song, you may notice that the size of your trained model is roughly 2x the size of the released checkpoints. This is because transformers saves the model weights with float32. When I release the weights, I convert them to float16 to save storage space / bandwidth.

@yunh-w Can you share the size of your trained model weights with ls -lt like @Chen-Song does? Thanks.

codybum commented 1 year ago

@haotian-liu What is the process to convert float32 to float16? I have a 13B fine-tuned model that is 50G.

haotian-liu commented 1 year ago

@codybum You can use this script for compressing the model. Please make sure to set two different paths for the model instead of overwriting the fp32 model and only delete the fp32 source model after verifying the model is working properly. Thanks.

anonymous-atom commented 1 year ago

How can we fine tune it on custom data, and what's the format of dataset to feed-in ?

codybum commented 1 year ago

@anonymous-atom Here is an example dataset: https://huggingface.co/datasets/liuhaotian/LLaVA-Instruct-150K/blob/main/detail_23k.json

You just need to take your data and make it conform to this set. You can then use the build scripts, substituting your datasets as the training set.

rahulrajpv commented 11 months ago

hey anyone please share the way to finetune llava full code

ali7919 commented 10 months ago

@yunh-w Hi, what hardware did you use?