xorbitsai / inference

Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with any open-source language models, speech recognition models, and multimodal models, whether in the cloud, on-premises, or even on your laptop.
https://inference.readthedocs.io
Apache License 2.0
5.34k stars 433 forks source link

ValueError: [address=127.0.0.1:37657, pid=3125985] User-specified max_model_len (4096) is greater than the derived max_model_len (seq_length=2048 or model_max_length=None in model's config.json). This may lead to incorrect model outputs or CUDA errors. Make sure the value is correct and within the model context size. #1715

Open wuliaodeashuai opened 4 months ago

wuliaodeashuai commented 4 months ago

使用VLLM运行qwen14B-chat的时候总是报错: ValueError: [address=127.0.0.1:37657, pid=3125985] User-specified max_model_len (4096) is greater than the derived max_model_len (seq_length=2048 or model_max_length=None in model's config.json). This may lead to incorrect model outputs or CUDA errors. Make sure the value is correct and within the model context size. 2024-06-25 16:32:00,563 xinference.api.restful_api 3118895 ERROR [address=127.0.0.1:37657, pid=3125985] User-specified max_model_len (4096) is greater than the derived max_model_len (seq_length=2048 or model_max_length=None in model's config.json). This may lead to incorrect model outputs or CUDA errors. Make sure the value is correct and within the model context size. Traceback (most recent call last): File "/www/NLP/inference/xinference/api/restful_api.py", line 770, in launch_model model_uid = await (await self._get_supervisor_ref()).launch_builtin_model( File "/home/ai/anaconda3/envs/xinference/lib/python3.10/site-packages/xoscar/backends/context.py", line 227, in send return self._process_result_message(result) File "/home/ai/anaconda3/envs/xinference/lib/python3.10/site-packages/xoscar/backends/context.py", line 102, in _process_result_message raise message.as_instanceof_cause() File "/home/ai/anaconda3/envs/xinference/lib/python3.10/site-packages/xoscar/backends/pool.py", line 659, in send result = await self._run_coro(message.message_id, coro) File "/home/ai/anaconda3/envs/xinference/lib/python3.10/site-packages/xoscar/backends/pool.py", line 370, in _run_coro return await coro File "/home/ai/anaconda3/envs/xinference/lib/python3.10/site-packages/xoscar/api.py", line 384, in on_receive return await super().on_receive(message) # type: ignore File "xoscar/core.pyx", line 558, in on_receive__ raise ex File "xoscar/core.pyx", line 520, in xoscar.core._BaseActor.on_receive async with self._lock: File "xoscar/core.pyx", line 521, in xoscar.core._BaseActor.on_receive with debug_async_timeout('actor_lock_timeout', File "xoscar/core.pyx", line 526, in xoscar.core._BaseActor.__on_receive result = await result File "/www/NLP/inference/xinference/core/supervisor.py", line 837, in launch_builtin_model await _launch_model() File "/www/NLP/inference/xinference/core/supervisor.py", line 801, in _launch_model await _launch_one_model(rep_model_uid) File "/www/NLP/inference/xinference/core/supervisor.py", line 782, in _launch_one_model await worker_ref.launch_builtin_model( File "xoscar/core.pyx", line 284, in pyx_actor_method_wrapper async with lock: File "xoscar/core.pyx", line 287, in xoscar.core.__pyx_actor_method_wrapper result = await result File "/www/NLP/inference/xinference/core/utils.py", line 45, in wrapped ret = await func(*args, **kwargs) File "/www/NLP/inference/xinference/core/worker.py", line 665, in launch_builtin_model await model_ref.load() File "/home/ai/anaconda3/envs/xinference/lib/python3.10/site-packages/xoscar/backends/context.py", line 227, in send return self._process_result_message(result) File "/home/ai/anaconda3/envs/xinference/lib/python3.10/site-packages/xoscar/backends/context.py", line 102, in _process_result_message raise message.as_instanceof_cause() File "/home/ai/anaconda3/envs/xinference/lib/python3.10/site-packages/xoscar/backends/pool.py", line 659, in send result = await self._run_coro(message.message_id, coro) File "/home/ai/anaconda3/envs/xinference/lib/python3.10/site-packages/xoscar/backends/pool.py", line 370, in _run_coro return await coro File "/home/ai/anaconda3/envs/xinference/lib/python3.10/site-packages/xoscar/api.py", line 384, in on_receive return await super().on_receive(message) # type: ignore File "xoscar/core.pyx", line 558, in on_receive raise ex File "xoscar/core.pyx", line 520, in xoscar.core._BaseActor.__on_receive async with self._lock: File "xoscar/core.pyx", line 521, in xoscar.core._BaseActor.on_receive__ with debug_async_timeout('actor_lock_timeout', File "xoscar/core.pyx", line 526, in xoscar.core._BaseActor.on_receive__ result = await result File "/www/NLP/inference/xinference/core/model.py", line 278, in load self._model.load() File "/www/NLP/inference/xinference/model/llm/vllm/core.py", line 230, in load self._engine = AsyncLLMEngine.from_engine_args(engine_args) File "/home/ai/anaconda3/envs/xinference/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 371, in from_engine_args engine_config = engine_args.create_engine_config() File "/home/ai/anaconda3/envs/xinference/lib/python3.10/site-packages/vllm/engine/arg_utils.py", line 630, in create_engine_config model_config = ModelConfig( File "/home/ai/anaconda3/envs/xinference/lib/python3.10/site-packages/vllm/config.py", line 141, in init self.max_model_len = _get_and_verify_max_len( File "/home/ai/anaconda3/envs/xinference/lib/python3.10/site-packages/vllm/config.py", line 1317, in _get_and_verify_max_len raise ValueError( ValueError: [address=127.0.0.1:37657, pid=3125985] User-specified max_model_len (4096) is greater than the derived max_model_len (seq_length=2048 or model_max_length=None in model's config.json). This may lead to incorrect model outputs or CUDA errors. Make sure the value is correct and within the model context size.

但是当我使用transform的时候就可以运行,我设置的model_max_length为2048,qwen14B的上限是8192,问问各位大佬这是什么情况,我使用transform运行qwen14B-int4的时候也会出现这个问题

github-actions[bot] commented 3 months ago

This issue is stale because it has been open for 7 days with no activity.

CharlesHAO77 commented 2 months ago

我也遇到这个问题,请问如何解决

Valdanitooooo commented 2 months ago

https://huggingface.co/Qwen/Qwen-14B-Chat-Int4/blob/main/config.json#L40

模型的默认配置是2048,你去模型权重的目录里的配置文件改一下这个配置试试