用deploy部署qwen2vl，多个请求同时并发报错

Describe the bug

使用以下命令部署微调后的模型

CUDA_VISIBLE_DEVICES=1 swift deploy --model_type qwen2-vl-7b-instruct --model_id_or_path /root/ms-swift/train/qwen2-vl-7b-instruct/v1-20240906-145640/checkpoint-66-merged --port 20002

使用以下报文请求，无并发时正常使用

{ "model": "qwen2-vl-7b-instruct", "messages": [ { "role": "user", "content": "解析用户上传的图片内容，以json格式进行输出" } ], "temperature": 0, "images": [ "/root/datas/image/image_1.png" ], "stream": false }

当2个请求同时请求时，会报以下错误

错误信息

return self._call_impl(*args, **kwargs)

File "/root/envs/llama/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl return forward_call(*args, kwargs) File "/root/transformers/src/transformers/models/qwen2_vl/modeling_qwen2_vl.py", line 1607, in forward outputs = self.model( File "/root/envs/llama/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "/root/envs/llama/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl return forward_call(args, kwargs) File "/root/transformers/src/transformers/models/qwen2_vl/modeling_qwen2_vl.py", line 1144, in forward layer_outputs = decoder_layer( File "/root/envs/llama/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl return self._call_impl(*args, kwargs) File "/root/envs/llama/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl return forward_call(*args, *kwargs) File "/root/transformers/src/transformers/models/qwen2_vl/modeling_qwen2_vl.py", line 900, in forward hidden_states, self_attn_weights, present_key_value = self.self_attn( File "/root/envs/llama/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl return self._call_impl(args, kwargs) File "/root/envs/llama/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl return forward_call(*args, **kwargs) File "/root/transformers/src/transformers/models/qwen2_vl/modeling_qwen2_vl.py", line 798, in forward query_states, key_states = apply_multimodal_rotary_pos_emb( File "/root/transformers/src/transformers/models/qwen2_vl/modeling_qwen2_vl.py", line 183, in apply_multimodal_rotary_pos_emb sin = sin[position_ids] RuntimeError: CUDA error: device-side assert triggered CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1 Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

Your hardware and system info

A800 2张卡，使用CUDA_VISIBLE_DEVICES=1指定第二张部署 torch==2.4.0 CUDA=12.5

modelscope / ms-swift