vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
28.51k stars 4.23k forks source link

[Usage]: model not support lora but listed in supported models #3543

Open xiaobo-Chen opened 6 months ago

xiaobo-Chen commented 6 months ago

Your current environment

Collecting environment information... PyTorch version: 2.1.2+cu121 Is debug build: False CUDA used to build PyTorch: 12.1 ROCM used to build PyTorch: N/A

OS: Ubuntu 18.04.6 LTS (x86_64) GCC version: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0 Clang version: Could not collect CMake version: Could not collect Libc version: glibc-2.27

Python version: 3.10.13 (main, Sep 11 2023, 13:44:35) [GCC 11.2.0] (64-bit runtime) Python platform: Linux-5.4.0-150-generic-x86_64-with-glibc2.27 Is CUDA available: True CUDA runtime version: 11.4.120 CUDA_MODULE_LOADING set to: LAZY GPU models and configuration: GPU 0: NVIDIA GeForce RTX 3090 GPU 1: NVIDIA GeForce RTX 3090

Nvidia driver version: 535.113.01

How would you like to use vllm

I try to run the following commond:

python -m vllm.entrypoints.openai.api_server \ --model /home/T3090U1/CZ/model/Qwen1.5-7B-Chat/ \ --enable-lora \ --lora-modules sql-lora=/home/T3090U1/CZ/model/output_sft_qwen_0320/

error message: ValueError: Model Qwen2ForCausalLM does not support LoRA, but LoRA is enabled. Support for this model may be added in the future.

However, the document of vllm https://docs.vllm.ai/en/latest/models/supported_models.html said the Qwen2ForCausalLM is support lora.

I use the wrong commond or Qwen2ForCausalLM for lora is not supported?

xiaobo-Chen commented 6 months ago

The version of vllm I used is latest version, which is 0.3.3

jeejeelee commented 6 months ago

Qwen2 support LoRA. this error raised by lora_config .You can check your local qwen arch

jeejeelee commented 6 months ago

The version of vllm I used is latest version, which is 0.3.3

In version 0.33, Qwen2 indeed does not support LoRA.

simon-mo commented 6 months ago

The documentation by default points to main branch. In upcoming release, or build from source, you can use Qwen2 w/ LoRA

zlh1992 commented 6 months ago

can anyone show a case of qwen1.5 inference with lora by vllm?

The version of vllm I used is latest version, which is 0.3.3

In version 0.33, Qwen2 indeed does not support LoRA.

jeejeelee commented 6 months ago

can anyone show a case of qwen1.5 inference with lora by vllm?

The version of vllm I used is latest version, which is 0.3.3

In version 0.33, Qwen2 indeed does not support LoRA.

Please refer to multilora_inference.

jeejeelee commented 6 months ago

The documentation by default points to main branch. In upcoming release, or build from source, you can use Qwen2 w/ LoRA

@simon-mo Hi, now in main branch, chatglm3 and baichuan have supported LoRA, but the documentation don't display this feature yet, how to add it in documentation

simon-mo commented 6 months ago

Please send a PR editing this file: https://github.com/vllm-project/vllm/blob/main/docs/source/models/supported_models.rst