modelscope / ms-swift

Use PEFT or Full-parameter to finetune 300+ LLMs or 80+ MLLMs. (Qwen2, GLM4v, Internlm2.5, Yi, Llama3.1, Llava-Video, Internvl2, MiniCPM-V-2.6, Deepseek, Baichuan2, Gemma2, Phi3-Vision, ...)

https://swift.readthedocs.io/zh-cn/latest/Instruction/index.html

Apache License 2.0

3.41k stars 292 forks source link

🎉Support for finetuning of Qwen2-VL-Chat series models #1857

Open tastelikefeet opened 2 weeks ago

tastelikefeet commented 2 weeks ago

🎉The finetuning(VQA/OCR/Grounding/Video) for Qwen2-VL-Chat series models has been supported, please check the documentation below for details:

English

https://github.com/modelscope/ms-swift/blob/main/docs/source_en/Multi-Modal/qwen2-vl-best-practice.md

Chinese

https://github.com/modelscope/ms-swift/blob/main/docs/source/Multi-Modal/qwen2-vl%E6%9C%80%E4%BD%B3%E5%AE%9E%E8%B7%B5.md

Halflifefa commented 2 weeks ago

qwen2-vl need transformers>=4.45.0.dev0，but swift need transformers<4.45.0, how to fixed it?

tastelikefeet commented 2 weeks ago

qwen2-vl need transformers>=4.45.0.dev0，but swift need transformers<4.45.0, how to fixed it?

After installing swift, pip install git+https://github.com/huggingface/transformers.git

VietDunghacker commented 2 weeks ago

Qwen2-VL seems to not compatible with FlashAttention? When I add "--use_flash_attn True", I encountered this error (CUDA_LAUNCH_BLOCKING was enabled to print the exact trace):

[rank0]: File "***/lib/python3.10/site-packages/transformers/models/qwen2_vl/modeling_qwen2_vl.py", line 182, in apply_multimodal_rotary_pos_emb [rank0]: cos = cos[position_ids] [rank0]: RuntimeError: CUDA error: device-side assert triggered [rank0]: Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

EDIT: There seems to be a problem of the multimodal rotary position embedding introduced by Qwen2 VL. Even turn off Flash Attention I still encounter this error

tastelikefeet commented 2 weeks ago

Qwen2-VL seems to not compatible with FlashAttention? When I add "--use_flash_attn True", I encountered this error (CUDA_LAUNCH_BLOCKING was enabled to print the exact trace):

[rank0]: File "***/lib/python3.10/site-packages/transformers/models/qwen2_vl/modeling_qwen2_vl.py", line 182, in apply_multimodal_rotary_pos_emb [rank0]: cos = cos[position_ids] [rank0]: RuntimeError: CUDA error: device-side assert triggered [rank0]: Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

EDIT: There seems to be a problem of the multimodal rotary position embedding introduced by Qwen2 VL. Even turn off Flash Attention I still encounter this error

I do think this may be a bug in qwen2 code, I tried to fix by(modeling_qwen2_vl.py): This works

VietDunghacker commented 2 weeks ago

It did work, thanks.

wade30822 commented 2 weeks ago

Error occurs when i finetune with lora in V100：

RuntimeError: CUDA error: too many resources requested for launch CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1 Compile withTORCH_USE_CUDA_DSAto enable device-side assertions.

Betty-J commented 2 weeks ago

Could you provide the complete modeling_qwen2_vl.py file? I encountered an error while fine-tuning qwen2-vl-2b-instruct.
File "./miniconda3/envs/swift/lib/python3.11/site-packages/transformers/models/qwen2_vl/modeling_qwen2_vl.py", line 296, in forward [rank1]: attn_weights = torch.matmul(q, k.transpose(1, 2)) / math.sqrt(self.head_dim) AttributeError: 'VisionAttention' object has no attribute 'head_dim'

Jintao-Huang commented 2 weeks ago

Added an example of single-card A10 fine-tuning: https://github.com/modelscope/ms-swift/blob/main/docs/source_en/Multi-Modal/qwen2-vl-best-practice.md#image-ocr-fine-tuning

KirbytroNic0528 commented 2 weeks ago

我在A40上进行微调时，显存出现无限增长的问题导致CUDA out of memory CUDA_VISIBLE_DEVICES=3 swift sft \ --model_type qwen2-vl-7b-instruct \ --model_id_or_path qwen/Qwen2-VL-7B-Instruct \ --sft_type lora \ --dataset dataset.json 这是我使用的命令

Jintao-Huang commented 2 weeks ago

https://github.com/modelscope/ms-swift/issues/1860

You can save memory by reducing SIZE_FACTOR=8 and MAX_PIXELS=602112.

参考这里：https://swift.readthedocs.io/zh-cn/latest/Multi-Modal/qwen2-vl%E6%9C%80%E4%BD%B3%E5%AE%9E%E8%B7%B5.html#ocr

Jintao-Huang commented 2 weeks ago

Full parameter fine-tuning & freeze_vit support reference:

https://github.com/modelscope/ms-swift/issues/1879