Open yunh-w opened 1 year ago
Me too! After I finetune 7B, the model I got is three bin files, but what you release is two bin files. The files I get from finetune are all very large, and the total_size in "pytorch_model.bin.index.json" is 26970595328, while what you release is only 13485301760.
Hi @Chen-Song, you may notice that the size of your trained model is roughly 2x the size of the released checkpoints. This is because transformers
saves the model weights with float32
. When I release the weights, I convert them to float16
to save storage space / bandwidth.
@yunh-w Can you share the size of your trained model weights with ls -lt
like @Chen-Song does? Thanks.
@haotian-liu What is the process to convert float32 to float16? I have a 13B fine-tuned model that is 50G.
@codybum You can use this script for compressing the model. Please make sure to set two different paths for the model instead of overwriting the fp32 model and only delete the fp32 source model after verifying the model is working properly. Thanks.
How can we fine tune it on custom data, and what's the format of dataset to feed-in ?
@anonymous-atom Here is an example dataset: https://huggingface.co/datasets/liuhaotian/LLaVA-Instruct-150K/blob/main/detail_23k.json
You just need to take your data and make it conform to this set. You can then use the build scripts, substituting your datasets as the training set.
hey anyone please share the way to finetune llava full code
@yunh-w Hi, what hardware did you use?
Question
Hi, thanks on your great work!
I use the following command to fine-tune the LLaVA-7b model.
$PYTHON --nnodes=1 --nproc_per_node=8 --master_port=25001 \ llava/train/train_mem.py \ --model_name_or_path LLaMA-7b-convert \ --data_path $data_path \ --image_folder $image_folder \ --vision_tower $vision_tower \ --pretrain_mm_mlp_adapter LLaVA-7b-pretrain-projector-v0-CC3M-595K-original_caption.bin \ --mm_vision_select_layer -2 \ --mm_use_im_start_end True \ --bf16 True \ --output_dir ./checkpoints/llava-7B_new \ --num_train_epochs 5 \ --per_device_train_batch_size 4 \ --per_device_eval_batch_size 4 \ --gradient_accumulation_steps 1 \ --evaluation_strategy "no" \ --save_strategy "steps" \ --save_steps 5 \ --save_total_limit 3 \ --learning_rate 2e-5 \ --weight_decay 0. \ --warmup_ratio 0.03 \ --lr_scheduler_type "cosine" \ --logging_steps 1 \ --tf32 True \ --fsdp "full_shard auto_wrap" \ --fsdp_transformer_layer_cls_to_wrap 'LlamaDecoderLayer' \ --model_max_length 2048 \ --gradient_checkpointing True \ --lazy_preprocess True \ --report_to wandb
But three weights are obtained, when your LLaVA-7b weights number is two. And I get error when I load these fine-tuned weights. How to fine-tune the LLaVA-7b ? Thanks so much!
OSError: Unable to load weights from pytorch checkpoint file for 'LLaVA-main/checkpoints/llava-7B_new/checkpoint-5/pytorch_model-00003-of-00003.bin' at 'LLaVA-main/checkpoints/llava-7B_new/checkpoint-5/pytorch_model-00003-of-00003.bin'. If you tried to load a PyTorch model from a TF 2.0 checkpoint, please set from_tf=True.
I found that the third model was not saved completely. When saving, it was OOM, but the training did not stop.. Thanks.