Describe the bug
What the bug is, and how to reproduce, better with screenshots(描述bug以及复现过程,最好有截图)
I encountered an issue while training Qwen2VL with Flash Attention enabled. When the training data contains samples without images, the following error occurs:
RuntimeError: cu_seqlens_q must be on CUDA
It appears that media_inputs['image_grid_thw'] needs to be moved to the appropriate device when there are no images in the sample.
Describe the bug What the bug is, and how to reproduce, better with screenshots(描述bug以及复现过程,最好有截图)
I encountered an issue while training Qwen2VL with Flash Attention enabled. When the training data contains samples without images, the following error occurs:
It appears that media_inputs['image_grid_thw'] needs to be moved to the appropriate device when there are no images in the sample.
https://github.com/modelscope/ms-swift/blob/78b6f781550aa19281c26ba5b41120156032f349/swift/llm/utils/template.py#L1655C17-L1655C99
Add a
media_inputs['image_grid_thw'].to(device)
can resolve the issue.Your hardware and system info Write your system info like CUDA version/system/GPU/torch version here(在这里给出硬件信息和系统信息,如CUDA版本,系统,GPU型号和torch版本等)
Additional context Add any other context about the problem here(在这里补充其他信息)