Error with Qwen2VL When Images Are Missing

LukeForeverYoung commented 5 days ago

Describe the bug What the bug is, and how to reproduce, better with screenshots(描述bug以及复现过程，最好有截图)

I encountered an issue while training Qwen2VL with Flash Attention enabled. When the training data contains samples without images, the following error occurs:

RuntimeError: cu_seqlens_q must be on CUDA

It appears that media_inputs['image_grid_thw'] needs to be moved to the appropriate device when there are no images in the sample.

https://github.com/modelscope/ms-swift/blob/78b6f781550aa19281c26ba5b41120156032f349/swift/llm/utils/template.py#L1655C17-L1655C99

Add a media_inputs['image_grid_thw'].to(device) can resolve the issue.

Your hardware and system info Write your system info like CUDA version/system/GPU/torch version here(在这里给出硬件信息和系统信息，如CUDA版本，系统，GPU型号和torch版本等)

Additional context Add any other context about the problem here(在这里补充其他信息)

Jintao-Huang commented 3 days ago

What versions are acclerate and transformers?

LukeForeverYoung commented 3 days ago

What versions are acclerate and transformers?

accelerate 0.34.2 transformers 4.45.0.dev0

modelscope / ms-swift

Error with Qwen2VL When Images Are Missing #2452