hiyouga / LLaMA-Factory

Unified Efficient Fine-Tuning of 100+ LLMs (ACL 2024)
https://arxiv.org/abs/2403.13372
Apache License 2.0
34.78k stars 4.29k forks source link

How to train the mm_proj and the LLM part with lora of Qwen2-VL #5512

Open leoozy opened 2 months ago

leoozy commented 2 months ago

Reminder

System Info

Have installed all the requirements for Qwen2-vl

Reproduction

train_mm_proj_only:True Hello, I want to train the vision adapter and the LLM part with lora. Do I set the train_mm_proj_only as True like this:

model

model_name_or_path: Qwen/Qwen2-VL-72B-Instruct

method

stage: sft do_train: true finetuning_type: lora lora_target: all

dataset

dataset: mllm_demo,identity # video: mllm_video_demo template: qwen2_vl cutoff_len: 8900 max_samples: 1000 overwrite_cache: true preprocessing_num_workers: 16

output

output_dir: saves/qwen2_vl-7b/lora/sft logging_steps: 10 save_steps: 500 plot_loss: true overwrite_output_dir: true

train

per_device_train_batch_size: 1 gradient_accumulation_steps: 16 learning_rate: 1.0e-4 num_train_epochs: 1.0 lr_scheduler_type: cosine warmup_ratio: 0.05 bf16: true ddp_timeout: 180000000 visual_inputs: true deepspeed: examples/deepspeed/ds_z3_config.json train_mm_proj_only:True

lora

lora_alpha: 512 lora_dropout: 0.1 lora_rank: 256

eval

val_size: 0.1 per_device_eval_batch_size: 1 eval_strategy: steps eval_steps: 500

Expected behavior

No response

Others

No response

fabriceyhc commented 1 month ago

Im glad this is being reopened (original #5553) because I'm also having the same issue with 8 x 80GB A100s (640GB total) and Qwen2-VL-72B using both LoRA (should take ~180GB) and QLoRA (8-bit) (should take ~100GB). They both OOM, which is unexpected given that it's only a 72B model.

thunder95 commented 1 month ago

same issue with Qwen/Qwen2-VL-2B-Instruct, only a 2B model using qlora 4bit. When I tried with fsdp, it stucked there.

mehamednews commented 1 month ago

same issue (even the 7B required 4 A100 GPUs) I'm only having this problem when fine tuning qwen2-vl

VincentVanNF commented 3 weeks ago

4 80 G A100 works for me when training Qwen2-VL-72B with lora. I set cutoff_len=4096 in my exp. And the number of my dataset is about 6.6w. As the issue I proposed #5553, I try to do full sft of Qwen2-VL-72B, it takes 16 80G A100 to train just a few test example data but for the 6.6w data it will occur OOM so if need to train all of the 6.6w data,maybe it will take 32 * 80G A100. So maybe your cutofflen and datasets is much larger than mine which will cause OOM

xingenju commented 1 week ago

Any update about the 72B-Qwen2-vl lora or qlora training?