Open VietDunghacker opened 1 month ago
What version of transformers is it?
I am using version 4.45.2, and it works fine.
I use the different version of transformers including 4.45.2, and the bug still occurred. For more context, I finetuned Qwen2VL with lora, targetting all possible linear layers (including vision model). The environment is 1xA100 80GB. And when I rolled back the code to the commit b654118003a963ef55b088aad44f834b54a6a641, the code ran smoothly. I believe there is something strange with post_encode method for Qwen2VL, as somehow the training time is doubled after I pull the lastest commit fixing some bug in that method.
I also encountered this issue, and I believe the cause of the problem might lie in the following lines of code:
https://github.com/modelscope/ms-swift/blob/acd17e5a7d6a1f0073a48af164b1cf9ad5a1a561/swift/llm/utils/template.py#L1651-L1655
Because after I used the code below to move media_inputs['image_grid_thw']
to the device
, the issue no longer occurred.
device = input_ids.device
pixel_values = media_inputs['pixel_values'].to(device)
image_grid_thw = media_inputs['image_grid_thw'].to(device)
pixel_values = pixel_values.type(model.visual.get_dtype())
image_embeds = model.visual(pixel_values, grid_thw=image_grid_thw)
thank you, i will check it
Please tell me the version of accelerate.
accelerate: 1.1.1
Describe the bug When using Flash Attention (--use-flash-attention true) to train Qwen2VL model with mixed data (both image and text data), the code will yield the following error
When I disabled flash-attention, the code ran smoothly. I also noticed that when I remove the text-only data and enable flash-attention, the code will not yield error. I believe the issue was mentioned in https://github.com/modelscope/ms-swift/issues/2147 and was fixed recently, but have you tested it with Flash Attention?
Your hardware and system info Write your system info like CUDA version/system/GPU/torch version here(在这里给出硬件信息和系统信息,如CUDA版本,系统,GPU型号和torch版本等) torch: 2.4.0 flash-attention: 2.6.3
Additional context Add any other context about the problem here(在这里补充其他信息)