hiyouga / LLaMA-Factory

Unified Efficient Fine-Tuning of 100+ LLMs (ACL 2024)
https://arxiv.org/abs/2403.13372
Apache License 2.0
34.97k stars 4.32k forks source link

qwen2-vl微调,设置freeze_vision_tower=false后,训练一段时间会报错 #5680

Open xuyue1112 opened 1 month ago

xuyue1112 commented 1 month ago

Reminder

System Info

其中transformers使用了 21fac7abba2a37fae86106f87fcf9974fd1e3830 版本

Reproduction

method: stage: sft do_train: true finetuning_type: lora lora_target: all freeze_vision_tower: false

dataset: dataset: xxx eval_dataset: xxx template: qwen2_vl cutoff_len: 8192 overwrite_cache: true preprocessing_num_workers: 120

train: per_device_train_batch_size: 1 gradient_accumulation_steps: 16 learning_rate: 1.0e-4 num_train_epochs: 10.0 lr_scheduler_type: cosine warmup_ratio: 0.1 bf16: true ddp_timeout: 180000000

Expected behavior

希望通过将vision encdoer加入sft,提升模型效果。 迭代1000多个step后,会遇到上述报错。 同样的数据、其它配置不变,如果不设置freeze_vision_tower=false的话,可以正常sft

Others

thusinh1969 commented 3 weeks ago

Same issue here with many images in the sample with different resolution 1280...). The final finetuning resolution is 1024.

thusinh1969 commented 3 weeks ago

@hiyouga can you please have a look. This issue is annoying and any debug would help as your codes are tough to follow.

Thanks, Steve

Michael4933 commented 2 weeks ago

same same!!!

Well it sees that llama-factory simply doesn't support it... https://github.com/hiyouga/LLaMA-Factory/issues/5657