V100显卡 Ubuntu22.04系统 qwen2-vl-2b模型，单卡测试脚本运行正常，双卡，三卡，四卡运行异常。

参考 https://github.com/modelscope/ms-swift/blob/main/docs/source/Multi-Modal/qwen2-vl%E6%9C%80%E4%BD%B3%E5%AE%9E%E8%B7%B5.md，在 四个16G V100 显卡主机上，搭建环境，测试单样本推理脚本时发现，仅单卡时可以正常运行。双卡，三卡和四卡时运行异常。

搭建环境

$ mkvirtualenv aivl -p /usr/bin/python3.10
(aivl) $ pip install torch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 --index-url https://download.pytorch.org/whl/cu118
(aivl) $ git clone https://github.com/modelscope/ms-swift.git
(aivl) $ cd ms-swift
(aivl) $ pip install -e .[llm]
#!< https://github.com/modelscope/ms-swift/blob/main/docs/source/Multi-Modal/qwen2-vl%E6%9C%80%E4%BD%B3%E5%AE%9E%E8%B7%B5.md 
(aivl) $ pip install git+https://github.com/huggingface/transformers.git
(aivl) $ pip install pyav qwen_vl_utils

#!< https://github.com/modelscope/ms-swift/issues/2064 
# qwen2-vl 
# https://github.com/QwenLM/Qwen2-VL/issues/96
(aivl) $ pip install git+https://github.com/huggingface/transformers@21fac7abba2a37fae86106f87fcf9974fd1e3830
# vllm加速
(aivl) $ pip install vllm>=0.6.1

测试脚本 `qwen2_vl_2b.py`

import os
#!< 调整环境变量CUDA_VISIBLE_DEVICES，分别为0; 0,1; 0,1,2; 0,1,2,3
os.environ['CUDA_VISIBLE_DEVICES'] = '0'

#!< --------------修改地方----------------
os.environ['SIZE_FACTOR'] = '8'
os.environ['MAX_PIXELS'] = '602112'
# ---------------------------------------

from swift.llm import (
get_model_tokenizer, get_template, inference, ModelType,
get_default_template_type, inference_stream
)
from swift.utils import seed_everything
import torch

model_type = ModelType.qwen2_vl_2b_instruct
template_type = get_default_template_type(model_type)
print(f'template_type: {template_type}')

#!< --------------------------修改地方，torch.float16-------------------------
model, tokenizer = get_model_tokenizer(model_type, torch.float16,
                                       model_kwargs={'device_map': 'auto'})
model.generation_config.max_new_tokens = 256
template = get_template(template_type, tokenizer)
seed_everything(42)

query = """<img>http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/road.png</img>距离各城市多远？"""
response, history = inference(model, template, query)
print(f'query: {query}')
print(f'response: {response}')

# 流式
query = '距离最远的城市是哪？'
gen = inference_stream(model, template, query, history)
print_idx = 0
print(f'query: {query}\nresponse: ', end='')
for response, history in gen:
    delta = response[print_idx:]
    print(delta, end='', flush=True)
    print_idx = len(response)
print()
print(f'history: {history}')

"""
template_type: qwen2-vl
query: <img>http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/road.png</img>距离各城市多远？
response: 根据图片中的路标，距离各城市的距离如下：

- 马踏：14公里
- 阳江：62公里
- 广州：293公里
query: 距离最远的城市是哪？
response: 距离最远的城市是广州，距离为293公里。
history: [['<img>http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/road.png</img>距离各城市多远？', '根据图片中的路标，距离各城市的距离如下：\n\n- 马踏：14公里\n- 阳江：62公里\n- 广州：293公里'], ['距离最远的城市是哪？', '距离最远的城市是广州，距离为293公里。']]
"""

单卡测试结果

将测试脚本中os.environ['CUDA_VISIBLE_DEVICES'] 设置为 0。

$ python3 qwen2_vl_2b.py
[INFO:swift] Successfully registered `/home/ps/Github/swift/swift/llm/data/dataset_info.json`
[INFO:swift] No LMDeploy installed, if you are using LMDeploy, you will get `ImportError: cannot import name 'prepare_lmdeploy_engine_template' from 'swift.llm'`
template_type: qwen2-vl
[INFO:swift] Downloading the model from ModelScope Hub, model_id: qwen/Qwen2-VL-2B-Instruct
[WARNING:modelscope] Using branch: master as version is unstable, use with caution
[INFO:swift] Loading the model using model_dir: /home/ps/.cache/modelscope/hub/qwen/Qwen2-VL-2B-Instruct
Unrecognized keys in `rope_scaling` for 'rope_type'='default': {'mrope_section'}
[INFO:swift] model_kwargs: {'device_map': 'auto'}
The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
`Qwen2VLRotaryEmbedding` can now be fully parameterized by passing the model config through the `config` argument. All other arguments will be removed in v4.46
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:02<00:00,  1.08s/it]
[INFO:swift] model.max_model_len: 32768
[INFO:swift] Global seed set to 42
[INFO:swift] Using environment variable `SIZE_FACTOR`, Setting size_factor: 8.
[INFO:swift] Setting resized_height: None. You can adjust this hyperparameter through the environment variable: `RESIZED_HEIGHT`.
[INFO:swift] Setting resized_width: None. You can adjust this hyperparameter through the environment variable: `RESIZED_WIDTH`.
[INFO:swift] Setting min_pixels: 3136. You can adjust this hyperparameter through the environment variable: `MIN_PIXELS`.
[INFO:swift] Using environment variable `MAX_PIXELS`, Setting max_pixels: 602112.
query: <img>http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/road.png</img>距离各城市多远？
response: 这张图片显示了从马踏到阳江的距离是14公里，从阳江到广州的距离是62公里，从广州到马踏的距离是293公里。
query: 距离最远的城市是哪？
response: 距离最远的城市是广州，从马踏到广州的距离是293公里。
history: [['<img>http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/road.png</img>距离各城市多远？', '这张图片显示了从马踏到阳江的距离是14公里，从阳江到广州的距离是62公里，从广州到马踏的距离是293公里。'], ['距离最远的城市是哪？', '距离最远的城市是广州，从马踏到广州的距离是293公里。']]

双卡测试结果

将测试脚本中os.environ['CUDA_VISIBLE_DEVICES'] 设置为 0,1。

$ python3 qwen2_vl_2b.py
[INFO:swift] Successfully registered `/home/ps/Github/swift/swift/llm/data/dataset_info.json`
[INFO:swift] No LMDeploy installed, if you are using LMDeploy, you will get `ImportError: cannot import name 'prepare_lmdeploy_engine_template' from 'swift.llm'`
template_type: qwen2-vl
[INFO:swift] Downloading the model from ModelScope Hub, model_id: qwen/Qwen2-VL-2B-Instruct
[WARNING:modelscope] Using branch: master as version is unstable, use with caution
[INFO:swift] Loading the model using model_dir: /home/ps/.cache/modelscope/hub/qwen/Qwen2-VL-2B-Instruct
Unrecognized keys in `rope_scaling` for 'rope_type'='default': {'mrope_section'}
[INFO:swift] model_kwargs: {'device_map': 'auto'}
The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
`Qwen2VLRotaryEmbedding` can now be fully parameterized by passing the model config through the `config` argument. All other arguments will be removed in v4.46
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:02<00:00,  1.10s/it]
[INFO:swift] model.max_model_len: 32768
[INFO:swift] Global seed set to 42
[INFO:swift] Using environment variable `SIZE_FACTOR`, Setting size_factor: 8.
[INFO:swift] Setting resized_height: None. You can adjust this hyperparameter through the environment variable: `RESIZED_HEIGHT`.
[INFO:swift] Setting resized_width: None. You can adjust this hyperparameter through the environment variable: `RESIZED_WIDTH`.
[INFO:swift] Setting min_pixels: 3136. You can adjust this hyperparameter through the environment variable: `MIN_PIXELS`.
[INFO:swift] Using environment variable `MAX_PIXELS`, Setting max_pixels: 602112.
Traceback (most recent call last):
  File "/home/ps/Github/AiVl/scripts/qwen2_vl_2b.py", line 24, in <module>
    response, history = inference(model, template, query)
  File "/home/ps/.virtualenvs/aivl/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/home/ps/Github/swift/swift/llm/utils/utils.py", line 864, in inference
    generate_ids = model.generate(streamer=streamer, generation_config=generation_config, **inputs)
  File "/home/ps/.virtualenvs/aivl/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/home/ps/.virtualenvs/aivl/lib/python3.10/site-packages/transformers/generation/utils.py", line 2053, in generate
    result = self._sample(
  File "/home/ps/.virtualenvs/aivl/lib/python3.10/site-packages/transformers/generation/utils.py", line 3040, in _sample
    next_tokens = torch.multinomial(probs, num_samples=1).squeeze(1)
RuntimeError: probability tensor contains either `inf`, `nan` or element < 0

三卡和四卡测试结果

将测试脚本中os.environ['CUDA_VISIBLE_DEVICES'] 分别设置为 0,1,2 和 0,1,2,3。

$ python3 qwen2_vl_2b.py
[INFO:swift] Successfully registered `/home/ps/Github/swift/swift/llm/data/dataset_info.json`
[INFO:swift] No LMDeploy installed, if you are using LMDeploy, you will get `ImportError: cannot import name 'prepare_lmdeploy_engine_template' from 'swift.llm'`
template_type: qwen2-vl
[INFO:swift] Downloading the model from ModelScope Hub, model_id: qwen/Qwen2-VL-2B-Instruct
[WARNING:modelscope] Using branch: master as version is unstable, use with caution
[INFO:swift] Loading the model using model_dir: /home/ps/.cache/modelscope/hub/qwen/Qwen2-VL-2B-Instruct
Unrecognized keys in `rope_scaling` for 'rope_type'='default': {'mrope_section'}
[INFO:swift] model_kwargs: {'device_map': 'auto'}
The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
`Qwen2VLRotaryEmbedding` can now be fully parameterized by passing the model config through the `config` argument. All other arguments will be removed in v4.46
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:02<00:00,  1.07s/it]
[INFO:swift] model.max_model_len: 32768
[INFO:swift] Global seed set to 42
[INFO:swift] Using environment variable `SIZE_FACTOR`, Setting size_factor: 8.
[INFO:swift] Setting resized_height: None. You can adjust this hyperparameter through the environment variable: `RESIZED_HEIGHT`.
[INFO:swift] Setting resized_width: None. You can adjust this hyperparameter through the environment variable: `RESIZED_WIDTH`.
[INFO:swift] Setting min_pixels: 3136. You can adjust this hyperparameter through the environment variable: `MIN_PIXELS`.
[INFO:swift] Using environment variable `MAX_PIXELS`, Setting max_pixels: 602112.
../aten/src/ATen/native/cuda/Indexing.cu:1231: indexSelectSmallIndex: block: [4,0,0], thread: [0,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
......
Traceback (most recent call last):
  File "/home/ps/Github/AiVl/scripts/qwen2_vl_2b.py", line 24, in <module>
    response, history = inference(model, template, query)
  File "/home/ps/.virtualenvs/aivl/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/home/ps/Github/swift/swift/llm/utils/utils.py", line 864, in inference
    generate_ids = model.generate(streamer=streamer, generation_config=generation_config, **inputs)
  File "/home/ps/.virtualenvs/aivl/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/home/ps/.virtualenvs/aivl/lib/python3.10/site-packages/transformers/generation/utils.py", line 2053, in generate
    result = self._sample(
  File "/home/ps/.virtualenvs/aivl/lib/python3.10/site-packages/transformers/generation/utils.py", line 3003, in _sample
    outputs = self(**model_inputs, return_dict=True)
  File "/home/ps/.virtualenvs/aivl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/ps/.virtualenvs/aivl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/ps/.virtualenvs/aivl/lib/python3.10/site-packages/accelerate/hooks.py", line 170, in new_forward
    output = module._old_forward(*args, **kwargs)
  File "/home/ps/.virtualenvs/aivl/lib/python3.10/site-packages/transformers/models/qwen2_vl/modeling_qwen2_vl.py", line 1680, in forward
    inputs_embeds = self.model.embed_tokens(input_ids)
  File "/home/ps/.virtualenvs/aivl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/ps/.virtualenvs/aivl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1603, in _call_impl
    result = forward_call(*args, **kwargs)
  File "/home/ps/.virtualenvs/aivl/lib/python3.10/site-packages/accelerate/hooks.py", line 170, in new_forward
    output = module._old_forward(*args, **kwargs)
  File "/home/ps/.virtualenvs/aivl/lib/python3.10/site-packages/torch/nn/modules/sparse.py", line 164, in forward
    return F.embedding(
  File "/home/ps/.virtualenvs/aivl/lib/python3.10/site-packages/torch/nn/functional.py", line 2267, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: CUDA error: device-side assert triggered
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

针对这个问题，有什么解决方案么？

modelscope / ms-swift

V100显卡 Ubuntu22.04系统 qwen2-vl-2b模型，单卡测试脚本运行正常，双卡，三卡，四卡运行异常。 #2087

搭建环境

测试脚本 `qwen2_vl_2b.py`

单卡测试结果

双卡测试结果

三卡和四卡测试结果

参考连接

modelscope / ms-swift

V100显卡 Ubuntu22.04系统 qwen2-vl-2b模型， 单卡测试脚本运行正常，双卡，三卡，四卡运行异常。 #2087

搭建环境

测试脚本 qwen2_vl_2b.py

单卡测试结果

双卡测试结果

三卡和四卡测试结果

参考连接

V100显卡 Ubuntu22.04系统 qwen2-vl-2b模型，单卡测试脚本运行正常，双卡，三卡，四卡运行异常。 #2087

测试脚本 `qwen2_vl_2b.py`