最新取用的代码，运行api.py（或者是webui.py）报错，错误信息均是：ImportError: cannot import name 'MultiModalData' from 'vllm.sequence' (/usr/local/lib/python3.10/dist-packages/vllm/sequence.py)

camposs1979 commented 4 months ago

Reminder

[X] I have read the README and searched the existing issues.

Reproduction

(base) root@I19c2837ff800901ccf:/hy-tmp/LLaMA-Factory-main/src# CUDA_VISIBLE_DEVICES=0,1,2,3 python3.10 api.py \

--model_name_or_path ../model/qwen/Qwen1.5-72B-Chat \
--adapter_name_or_path ../saves/qwen/lora/sft/checkpoint-500 \
--template qwen \
--finetuning_type lora \
--use_fast_tokenizer True \
--repetition_penalty 1.03 \
--cutoff_len  8192 \
--flash_attn
/usr/local/lib/python3.10/dist-packages/bitsandbytes/cextension.py:31: UserWarning: The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable. warn("The installed version of bitsandbytes was compiled without GPU support. " /usr/local/lib/python3.10/dist-packages/bitsandbytes/libbitsandbytes_cpu.so: undefined symbol: cadam32bit_grad_fp32 Traceback (most recent call last): File "/hy-tmp/LLaMA-Factory-main/src/api.py", line 5, in from llmtuner.api.app import create_app File "/hy-tmp/LLaMA-Factory-main/src/llmtuner/api/app.py", line 5, in from ..chat import ChatModel File "/hy-tmp/LLaMA-Factory-main/src/llmtuner/chat/init.py", line 2, in from .chat_model import ChatModel File "/hy-tmp/LLaMA-Factory-main/src/llmtuner/chat/chat_model.py", line 8, in from .vllm_engine import VllmEngine File "/hy-tmp/LLaMA-Factory-main/src/llmtuner/chat/vllm_engine.py", line 14, in from vllm.sequence import MultiModalData ImportError: cannot import name 'MultiModalData' from 'vllm.sequence' (/usr/local/lib/python3.10/dist-packages/vllm/sequence.py)

Expected behavior

多卡通过webui.py运行Qwen1.5-72B-Chat模型。非常奇怪的是：脚本里面并没有加上 _--inferbackend vllm \，但是为什么报错信息是错在了上。备注：这个是在无卡模式打印出来的消息，实际上4卡运行的时候，错误信息差不多，只少了withoutGPU那段（有卡模式是有GPU的）

System Info

(base) root@I19c2837ff800901ccf:/hy-tmp/LLaMA-Factory-main/src# python3.10 -m pip list Package Version

accelerate 0.28.0 addict 2.4.0 aiofiles 23.2.1 aiohttp 3.9.3 aiosignal 1.3.1 aliyun-python-sdk-core 2.15.0 aliyun-python-sdk-kms 2.16.2 altair 5.2.0 annotated-types 0.6.0 anyio 4.3.0 async-timeout 4.0.3 attrs 23.2.0 auto_gptq 0.7.1 bitsandbytes 0.43.0 certifi 2019.11.28 cffi 1.16.0 chardet 3.0.4 charset-normalizer 3.3.2 click 8.1.7 cloudpickle 3.0.0 contourpy 1.2.0 crcmod 1.7 cryptography 42.0.5 cupy-cuda12x 12.1.0 cycler 0.12.1 datasets 2.18.0 dbus-python 1.2.16 deepspeed 0.14.0 dill 0.3.8 diskcache 5.6.3 distro 1.4.0 distro-info 0.23ubuntu1 docstring_parser 0.16 einops 0.7.0 exceptiongroup 1.2.0 fastapi 0.110.0 fastrlock 0.8.2 ffmpy 0.3.2 filelock 3.13.3 fire 0.6.0 fonttools 4.50.0 frozenlist 1.4.1 fsspec 2024.2.0 galore-torch 1.0 gast 0.5.4 gekko 1.0.7 gradio 4.10.0 gradio_client 0.7.3 h11 0.14.0 hjson 3.1.0 httpcore 1.0.4 httptools 0.6.1 httpx 0.27.0 huggingface-hub 0.22.0 idna 2.8 importlib_metadata 7.1.0 importlib_resources 6.4.0 interegular 0.3.3 Jinja2 3.1.3 jmespath 0.10.0 joblib 1.3.2 jsonschema 4.21.1 jsonschema-specifications 2023.12.1 kiwisolver 1.4.5 lark 1.1.9 llvmlite 0.42.0 markdown-it-py 3.0.0 MarkupSafe 2.1.5 matplotlib 3.8.3 mdurl 0.1.2 modelscope 1.13.3 mpmath 1.3.0 msgpack 1.0.8 multidict 6.0.5 multiprocess 0.70.16 nest-asyncio 1.6.0 networkx 3.2.1 ninja 1.11.1.1 numba 0.59.1 numpy 1.26.4 nvidia-cublas-cu12 12.1.3.1 nvidia-cuda-cupti-cu12 12.1.105 nvidia-cuda-nvrtc-cu12 12.1.105 nvidia-cuda-runtime-cu12 12.1.105 nvidia-cudnn-cu12 8.9.2.26 nvidia-cufft-cu12 11.0.2.54 nvidia-curand-cu12 10.3.2.106 nvidia-cusolver-cu12 11.4.5.107 nvidia-cusparse-cu12 12.1.0.106 nvidia-nccl-cu12 2.18.1 nvidia-nvjitlink-cu12 12.4.99 nvidia-nvtx-cu12 12.1.105 orjson 3.9.15 oss2 2.18.4 outlines 0.0.37 packaging 24.0 pandas 2.2.1 peft 0.10.0 pillow 10.2.0 pip 24.0 platformdirs 4.2.0 prometheus_client 0.20.0 protobuf 5.26.0 psutil 5.9.8 py-cpuinfo 9.0.0 pyarrow 15.0.2 pyarrow-hotfix 0.6 pycparser 2.21 pycryptodome 3.20.0 pydantic 2.6.4 pydantic_core 2.16.3 pydub 0.25.1 Pygments 2.17.2 PyGObject 3.36.0 pynvml 11.5.0 pyparsing 3.1.2 python-apt 2.0.1+ubuntu0.20.4.1 python-dateutil 2.9.0.post0 python-dotenv 1.0.1 python-multipart 0.0.9 pytz 2024.1 PyYAML 6.0.1 ray 2.10.0 referencing 0.34.0 regex 2023.12.25 requests 2.31.0 requests-unixsocket 0.2.0 rich 13.7.1 rouge 1.0.1 rpds-py 0.18.0 safetensors 0.4.2 scipy 1.12.0 semantic-version 2.10.0 sentencepiece 0.2.0 setuptools 69.2.0 shellingham 1.5.4 shtab 1.7.1 simplejson 3.19.2 six 1.14.0 sniffio 1.3.1 sortedcontainers 2.4.0 sse-starlette 2.0.0 ssh-import-id 5.10 starlette 0.36.3 sympy 1.12 termcolor 2.4.0 tokenizers 0.15.2 tomli 2.0.1 tomlkit 0.12.0 toolz 0.12.1 torch 2.1.2 tqdm 4.66.2 transformers 4.39.1 triton 2.1.0 trl 0.8.1 typer 0.12.3 typing_extensions 4.10.0 tyro 0.7.3 tzdata 2024.1 unattended-upgrades 0.1 urllib3 2.2.1 uvicorn 0.29.0 uvloop 0.19.0 vllm 0.3.3 watchfiles 0.21.0 websockets 11.0.3 wheel 0.34.2 xformers 0.0.23.post1 xxhash 3.4.1 yapf 0.40.2 yarl 1.9.4 zipp 3.18.1

Others

No response

hiyouga commented 4 months ago

项目要求 vllm 最低 0.4.0

camposs1979 commented 4 months ago

项目要求 vllm 最低 0.4.0

感谢大佬指点。

Mr-Otaku-Lin commented 3 months ago

项目要求 vllm 最低 0.4.0

我使用pip install vllm安装会显示RuntimeError: No suitable kernel. h_in=8 h_out=18944 dtype=Float out_dtype=BFloat16，这是在我使用api推理bf16、lora训练qwen2-7B后报错的信息。

然后我尝试找到vllm源码并新增f(in_T, out_T, W_T, narrow, 18944) \，用源码编译后报错cannot import name 'MultiModalData' from 'vllm.sequence'。难道是一开始使用bf16训练就是错误的吗？请告知，谢谢。

Mewral commented 3 months ago

vllm 0.5.0 python 3.10 会报错

1402564807 commented 3 months ago

vllm 0.5.0 python 3.11 也会报错

webwlsong commented 3 months ago

vllm 0.5.0 Python 3.9.18 也会报错

Copilot-X commented 3 months ago

把vllm的版本从0.5.0降低到0.4.3就可以啦，亲测！

verigle commented 1 month ago

把vllm的版本从0.5.0降低到0.4.3就可以啦，亲测！

0.4.3 适配qwen2了吗？如果一定要用0.5.x版本是否有解决办法呢？

lebronjamesking commented 1 month ago

Hi there, I'm strugging on vllm multimodaldata as well. I have tried vllm from 0.4.3 till 0.5.4, all not working. I am using python 3.10 and llama-factory=0.8.0

dong-liuliu commented 3 weeks ago

0.4.3 适配qwen2了吗？如果一定要用0.5.x版本是否有解决办法呢？

0.5.x 版本已经不需要用MultiModalData了。

llm.generate的参数里"multi_modal_data"可以这样写：



url = "https://h2o-release.s3.amazonaws.com/h2ogpt/bigben.jpg"
image = Image.open(BytesIO(requests.get(url).content))

outputs = llm.generate(
    {
        "prompt": prompt,
        "multi_modal_data": {
            "image": image
        }
    },

yang-chenyu104 commented 2 weeks ago

之前的版本在dcu进行训练也会出现这个问题

yang-chenyu104 commented 2 weeks ago

最新的版本更新到0.5就不会了

hiyouga / LLaMA-Factory