File "/root/envs/llama/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
return forward_call(*args, kwargs)
File "/root/transformers/src/transformers/models/qwen2_vl/modeling_qwen2_vl.py", line 1607, in forward
outputs = self.model(
File "/root/envs/llama/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, *kwargs)
File "/root/envs/llama/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
return forward_call(args, kwargs)
File "/root/transformers/src/transformers/models/qwen2_vl/modeling_qwen2_vl.py", line 1144, in forward
layer_outputs = decoder_layer(
File "/root/envs/llama/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, kwargs)
File "/root/envs/llama/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
return forward_call(*args, *kwargs)
File "/root/transformers/src/transformers/models/qwen2_vl/modeling_qwen2_vl.py", line 900, in forward
hidden_states, self_attn_weights, present_key_value = self.self_attn(
File "/root/envs/llama/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
return self._call_impl(args, kwargs)
File "/root/envs/llama/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
File "/root/transformers/src/transformers/models/qwen2_vl/modeling_qwen2_vl.py", line 798, in forward
query_states, key_states = apply_multimodal_rotary_pos_emb(
File "/root/transformers/src/transformers/models/qwen2_vl/modeling_qwen2_vl.py", line 183, in apply_multimodal_rotary_pos_emb
sin = sin[position_ids]
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.
Describe the bug
使用以下命令部署微调后的模型
CUDA_VISIBLE_DEVICES=1 swift deploy --model_type qwen2-vl-7b-instruct --model_id_or_path /root/ms-swift/train/qwen2-vl-7b-instruct/v1-20240906-145640/checkpoint-66-merged --port 20002
使用以下报文请求,无并发时正常使用
{ "model": "qwen2-vl-7b-instruct", "messages": [ { "role": "user", "content": "解析用户上传的图片内容,以json格式进行输出" } ], "temperature": 0, "images": [ "/root/datas/image/image_1.png" ], "stream": false }
当2个请求同时请求时,会报以下错误
错误信息
File "/root/envs/llama/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl return forward_call(*args, kwargs) File "/root/transformers/src/transformers/models/qwen2_vl/modeling_qwen2_vl.py", line 1607, in forward outputs = self.model( File "/root/envs/llama/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "/root/envs/llama/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl return forward_call(args, kwargs) File "/root/transformers/src/transformers/models/qwen2_vl/modeling_qwen2_vl.py", line 1144, in forward layer_outputs = decoder_layer( File "/root/envs/llama/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl return self._call_impl(*args, kwargs) File "/root/envs/llama/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl return forward_call(*args, *kwargs) File "/root/transformers/src/transformers/models/qwen2_vl/modeling_qwen2_vl.py", line 900, in forward hidden_states, self_attn_weights, present_key_value = self.self_attn( File "/root/envs/llama/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl return self._call_impl(args, kwargs) File "/root/envs/llama/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl return forward_call(*args, **kwargs) File "/root/transformers/src/transformers/models/qwen2_vl/modeling_qwen2_vl.py", line 798, in forward query_states, key_states = apply_multimodal_rotary_pos_emb( File "/root/transformers/src/transformers/models/qwen2_vl/modeling_qwen2_vl.py", line 183, in apply_multimodal_rotary_pos_emb sin = sin[position_ids] RuntimeError: CUDA error: device-side assert triggered CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1 Compile with
TORCH_USE_CUDA_DSA
to enable device-side assertions.Your hardware and system info
A800 2张卡,使用CUDA_VISIBLE_DEVICES=1指定第二张部署 torch==2.4.0 CUDA=12.5