Open leoozy opened 2 months ago
Im glad this is being reopened (original #5553) because I'm also having the same issue with 8 x 80GB A100s (640GB total) and Qwen2-VL-72B using both LoRA (should take ~180GB) and QLoRA (8-bit) (should take ~100GB). They both OOM, which is unexpected given that it's only a 72B model.
same issue with Qwen/Qwen2-VL-2B-Instruct, only a 2B model using qlora 4bit. When I tried with fsdp, it stucked there.
same issue (even the 7B required 4 A100 GPUs) I'm only having this problem when fine tuning qwen2-vl
4 80 G A100 works for me when training Qwen2-VL-72B with lora. I set cutoff_len=4096 in my exp. And the number of my dataset is about 6.6w. As the issue I proposed #5553, I try to do full sft of Qwen2-VL-72B, it takes 16 80G A100 to train just a few test example data but for the 6.6w data it will occur OOM so if need to train all of the 6.6w data,maybe it will take 32 * 80G A100. So maybe your cutofflen and datasets is much larger than mine which will cause OOM
Any update about the 72B-Qwen2-vl lora or qlora training?
Reminder
System Info
Have installed all the requirements for Qwen2-vl
Reproduction
train_mm_proj_only:True Hello, I want to train the vision adapter and the LLM part with lora. Do I set the train_mm_proj_only as True like this:
model
model_name_or_path: Qwen/Qwen2-VL-72B-Instruct
method
stage: sft do_train: true finetuning_type: lora lora_target: all
dataset
dataset: mllm_demo,identity # video: mllm_video_demo template: qwen2_vl cutoff_len: 8900 max_samples: 1000 overwrite_cache: true preprocessing_num_workers: 16
output
output_dir: saves/qwen2_vl-7b/lora/sft logging_steps: 10 save_steps: 500 plot_loss: true overwrite_output_dir: true
train
per_device_train_batch_size: 1 gradient_accumulation_steps: 16 learning_rate: 1.0e-4 num_train_epochs: 1.0 lr_scheduler_type: cosine warmup_ratio: 0.05 bf16: true ddp_timeout: 180000000 visual_inputs: true deepspeed: examples/deepspeed/ds_z3_config.json train_mm_proj_only:True
lora
lora_alpha: 512 lora_dropout: 0.1 lora_rank: 256
eval
val_size: 0.1 per_device_eval_batch_size: 1 eval_strategy: steps eval_steps: 500
Expected behavior
No response
Others
No response