Closed wxz1996 closed 27 minutes ago
Qwen-7B-Chat can run properly
same error
qwen1.5 support larger max seq len, 32768, so it consumes more gpu memory by default, decrease the max seq len when starting or use larger gpu mem
just set this parameter when starting up :--max-model-len 8192
Traceback (most recent call last): File "/home/orbbec/VLM/qwen/vllm_test.py", line 11, in <module> llm = LLM(model="/home/orbbec/VLM/qwen/model/qwen1.5/Qwen1.5-7B-Chat", File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/llm.py", line 109, in __init__ self.llm_engine = LLMEngine.from_engine_args(engine_args) File "/usr/local/lib/python3.10/dist-packages/vllm/engine/llm_engine.py", line 356, in from_engine_args engine = cls(*engine_configs, File "/usr/local/lib/python3.10/dist-packages/vllm/engine/llm_engine.py", line 114, in __init__ self._init_cache() File "/usr/local/lib/python3.10/dist-packages/vllm/engine/llm_engine.py", line 331, in _init_cache raise ValueError( ValueError: The model's max seq len (32768) is larger than the maximum number of tokens that can be stored in KV cache (1984). Try increasing
gpu_memory_utilizationor decreasing
max_model_lenwhen initializing the engine.
I saw on the Qianwen official website that version 0.30.0 is supported. I tried it and found an error. May I ask what might have caused it?