最新取用的代码,运行api.py(或者是webui.py)报错,错误信息均是:ImportError: cannot import name 'MultiModalData' from 'vllm.sequence' (/usr/local/lib/python3.10/dist-packages/vllm/sequence.py) #3645

(base) root@I19c2837ff800901ccf:/hy-tmp/LLaMA-Factory-main/src# CUDA_VISIBLE_DEVICES=0,1,2,3 python3.10 api.py \

--model_name_or_path ../model/qwen/Qwen1.5-72B-Chat \
--adapter_name_or_path ../saves/qwen/lora/sft/checkpoint-500 \
--template qwen \
--finetuning_type lora \
--use_fast_tokenizer True \
--repetition_penalty 1.03 \
--cutoff_len  8192 \

/usr/local/lib/python3.10/dist-packages/bitsandbytes/cextension.py:31: UserWarning: The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable. warn("The installed version of bitsandbytes was compiled without GPU support. " /usr/local/lib/python3.10/dist-packages/bitsandbytes/libbitsandbytes_cpu.so: undefined symbol: cadam32bit_grad_fp32 Traceback (most recent call last): File "/hy-tmp/LLaMA-Factory-main/src/api.py", line 5, in from llmtuner.api.app import create_app File "/hy-tmp/LLaMA-Factory-main/src/llmtuner/api/app.py", line 5, in from ..chat import ChatModel File "/hy-tmp/LLaMA-Factory-main/src/llmtuner/chat/init.py", line 2, in from .chat_model import ChatModel File "/hy-tmp/LLaMA-Factory-main/src/llmtuner/chat/chat_model.py", line 8, in from .vllm_engine import VllmEngine File "/hy-tmp/LLaMA-Factory-main/src/llmtuner/chat/vllm_engine.py", line 14, in from vllm.sequence import MultiModalData ImportError: cannot import name 'MultiModalData' from 'vllm.sequence' (/usr/local/lib/python3.10/dist-packages/vllm/sequence.py)

Expected behavior

多卡通过webui.py运行Qwen1.5-72B-Chat模型。 非常奇怪的是:脚本里面并没有加上 _--inferbackend vllm \,但是为什么报错信息是错在了上。 备注:这个是在无卡模式打印出来的消息,实际上4卡运行的时候,错误信息差不多,只少了withoutGPU那段(有卡模式是有GPU的)

System Info

(base) root@I19c2837ff800901ccf:/hy-tmp/LLaMA-Factory-main/src# python3.10 -m pip list Package Version

accelerate 0.28.0 addict 2.4.0 aiofiles 23.2.1 aiohttp 3.9.3 aiosignal 1.3.1 aliyun-python-sdk-core 2.15.0 aliyun-python-sdk-kms 2.16.2 altair 5.2.0 annotated-types 0.6.0 anyio 4.3.0 async-timeout 4.0.3 attrs 23.2.0 auto_gptq 0.7.1 bitsandbytes 0.43.0 certifi 2019.11.28 cffi 1.16.0 chardet 3.0.4 charset-normalizer 3.3.2 click 8.1.7 cloudpickle 3.0.0 contourpy 1.2.0 crcmod 1.7 cryptography 42.0.5 cupy-cuda12x 12.1.0 cycler 0.12.1 datasets 2.18.0 dbus-python 1.2.16 deepspeed 0.14.0 dill 0.3.8 diskcache 5.6.3 distro 1.4.0 distro-info 0.23ubuntu1 docstring_parser 0.16 einops 0.7.0 exceptiongroup 1.2.0 fastapi 0.110.0 fastrlock 0.8.2 ffmpy 0.3.2 filelock 3.13.3 fire 0.6.0 fonttools 4.50.0 frozenlist 1.4.1 fsspec 2024.2.0 galore-torch 1.0 gast 0.5.4 gekko 1.0.7 gradio 4.10.0 gradio_client 0.7.3 h11 0.14.0 hjson 3.1.0 httpcore 1.0.4 httptools 0.6.1 httpx 0.27.0 huggingface-hub 0.22.0 idna 2.8 importlib_metadata 7.1.0 importlib_resources 6.4.0 interegular 0.3.3 Jinja2 3.1.3 jmespath 0.10.0 joblib 1.3.2 jsonschema 4.21.1 jsonschema-specifications 2023.12.1 kiwisolver 1.4.5 lark 1.1.9 llvmlite 0.42.0 markdown-it-py 3.0.0 MarkupSafe 2.1.5 matplotlib 3.8.3 mdurl 0.1.2 modelscope 1.13.3 mpmath 1.3.0 msgpack 1.0.8 multidict 6.0.5 multiprocess 0.70.16 nest-asyncio 1.6.0 networkx 3.2.1 ninja numba 0.59.1 numpy 1.26.4 nvidia-cublas-cu12 nvidia-cuda-cupti-cu12 12.1.105 nvidia-cuda-nvrtc-cu12 12.1.105 nvidia-cuda-runtime-cu12 12.1.105 nvidia-cudnn-cu12 nvidia-cufft-cu12 nvidia-curand-cu12 nvidia-cusolver-cu12 nvidia-cusparse-cu12 nvidia-nccl-cu12 2.18.1 nvidia-nvjitlink-cu12 12.4.99 nvidia-nvtx-cu12 12.1.105 orjson 3.9.15 oss2 2.18.4 outlines 0.0.37 packaging 24.0 pandas 2.2.1 peft 0.10.0 pillow 10.2.0 pip 24.0 platformdirs 4.2.0 prometheus_client 0.20.0 protobuf 5.26.0 psutil 5.9.8 py-cpuinfo 9.0.0 pyarrow 15.0.2 pyarrow-hotfix 0.6 pycparser 2.21 pycryptodome 3.20.0 pydantic 2.6.4 pydantic_core 2.16.3 pydub 0.25.1 Pygments 2.17.2 PyGObject 3.36.0 pynvml 11.5.0 pyparsing 3.1.2 python-apt 2.0.1+ubuntu0.20.4.1 python-dateutil 2.9.0.post0 python-dotenv 1.0.1 python-multipart 0.0.9 pytz 2024.1 PyYAML 6.0.1 ray 2.10.0 referencing 0.34.0 regex 2023.12.25 requests 2.31.0 requests-unixsocket 0.2.0 rich 13.7.1 rouge 1.0.1 rpds-py 0.18.0 safetensors 0.4.2 scipy 1.12.0 semantic-version 2.10.0 sentencepiece 0.2.0 setuptools 69.2.0 shellingham 1.5.4 shtab 1.7.1 simplejson 3.19.2 six 1.14.0 sniffio 1.3.1 sortedcontainers 2.4.0 sse-starlette 2.0.0 ssh-import-id 5.10 starlette 0.36.3 sympy 1.12 termcolor 2.4.0 tokenizers 0.15.2 tomli 2.0.1 tomlkit 0.12.0 toolz 0.12.1 torch 2.1.2 tqdm 4.66.2 transformers 4.39.1 triton 2.1.0 trl 0.8.1 typer 0.12.3 typing_extensions 4.10.0 tyro 0.7.3 tzdata 2024.1 unattended-upgrades 0.1 urllib3 2.2.1 uvicorn 0.29.0 uvloop 0.19.0 vllm 0.3.3 watchfiles 0.21.0 websockets 11.0.3 wheel 0.34.2 xformers 0.0.23.post1 xxhash 3.4.1 yapf 0.40.2 yarl 1.9.4 zipp 3.18.1


hiyouga commented 4 months ago

项目要求 vllm 最低 0.4.0

camposs1979 commented 4 months ago

项目要求 vllm 最低 0.4.0


Mr-Otaku-Lin commented 3 months ago

项目要求 vllm 最低 0.4.0

我使用pip install vllm安装会显示RuntimeError: No suitable kernel. h_in=8 h_out=18944 dtype=Float out_dtype=BFloat16,这是在我使用api推理bf16、lora训练qwen2-7B后报错的信息。

然后我尝试找到vllm源码并新增f(in_T, out_T, W_T, narrow, 18944) \,用源码编译后报错cannot import name 'MultiModalData' from 'vllm.sequence'。难道是一开始使用bf16训练就是错误的吗?请告知,谢谢。

Mewral commented 3 months ago

vllm 0.5.0 python 3.10 会报错

1402564807 commented 3 months ago

vllm 0.5.0 python 3.11 也会报错

webwlsong commented 3 months ago

vllm 0.5.0 Python 3.9.18 也会报错

0.4.3 适配qwen2了吗?如果一定要用0.5.x版本 是否有解决办法呢?

lebronjamesking commented 1 month ago

Hi there, I'm strugging on vllm multimodaldata as well. I have tried vllm from 0.4.3 till 0.5.4, all not working. I am using python 3.10 and llama-factory=0.8.0

dong-liuliu commented 3 weeks ago

0.4.3 适配qwen2了吗?如果一定要用0.5.x版本 是否有解决办法呢?

0.5.x 版本已经不需要用MultiModalData了。


url = "https://h2o-release.s3.amazonaws.com/h2ogpt/bigben.jpg"
image = Image.open(BytesIO(requests.get(url).content))

outputs = llm.generate(
        "prompt": prompt,
        "multi_modal_data": {
            "image": image
