vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
25.41k stars 3.67k forks source link

embedings error python -m vllm.entrypoints.openai.api_server --trust-remote-code --model gte_Qwen2-7B-instruct --seed 48 --max-model-len 1000 --tensor-parallel-size 2 --gpu-memory-utilization 1 --dtype float16 #6015

Open 2679326161or opened 1 month ago

2679326161or commented 1 month ago

Your current environment

The output of `python collect_env.py`

🐛 Describe the bug

ERROR 07-01 08:12:10 async_llm_engine.py:52] Engine background task failed ERROR 07-01 08:12:10 async_llm_engine.py:52] Traceback (most recent call last): ERROR 07-01 08:12:10 async_llm_engine.py:52] File "/opt/conda/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 42, in _log_task_completion ERROR 07-01 08:12:10 async_llm_engine.py:52] return_value = task.result() ERROR 07-01 08:12:10 async_llm_engine.py:52] File "/opt/conda/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 532, in run_engine_loop ERROR 07-01 08:12:10 async_llm_engine.py:52] has_requests_in_progress = await asyncio.wait_for( ERROR 07-01 08:12:10 async_llm_engine.py:52] File "/opt/conda/lib/python3.10/asyncio/tasks.py", line 445, in wait_for ERROR 07-01 08:12:10 async_llm_engine.py:52] return fut.result() ERROR 07-01 08:12:10 async_llm_engine.py:52] File "/opt/conda/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 506, in engine_step ERROR 07-01 08:12:10 async_llm_engine.py:52] request_outputs = await self.engine.step_async() ERROR 07-01 08:12:10 async_llm_engine.py:52] File "/opt/conda/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 235, in step_async ERROR 07-01 08:12:10 async_llm_engine.py:52] output = await self.model_executor.execute_model_async( ERROR 07-01 08:12:10 async_llm_engine.py:52] File "/opt/conda/lib/python3.10/site-packages/vllm/executor/distributed_gpu_executor.py", line 166, in execute_model_async ERROR 07-01 08:12:10 async_llm_engine.py:52] return await self._driver_execute_model_async(execute_model_req) ERROR 07-01 08:12:10 async_llm_engine.py:52] File "/opt/conda/lib/python3.10/site-packages/vllm/executor/multiproc_gpu_executor.py", line 149, in _driver_execute_model_async ERROR 07-01 08:12:10 async_llm_engine.py:52] return await self.driver_exec_model(execute_model_req) ERROR 07-01 08:12:10 async_llm_engine.py:52] File "/opt/conda/lib/python3.10/concurrent/futures/thread.py", line 58, in run ERROR 07-01 08:12:10 async_llm_engine.py:52] result = self.fn(*self.args, self.kwargs) ERROR 07-01 08:12:10 async_llm_engine.py:52] File "/opt/conda/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context ERROR 07-01 08:12:10 async_llm_engine.py:52] return func(*args, *kwargs) ERROR 07-01 08:12:10 async_llm_engine.py:52] File "/opt/conda/lib/python3.10/site-packages/vllm/worker/worker.py", line 280, in execute_model ERROR 07-01 08:12:10 async_llm_engine.py:52] output = self.model_runner.execute_model(seq_group_metadata_list, ERROR 07-01 08:12:10 async_llm_engine.py:52] File "/opt/conda/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context ERROR 07-01 08:12:10 async_llm_engine.py:52] return func(args, kwargs) ERROR 07-01 08:12:10 async_llm_engine.py:52] File "/opt/conda/lib/python3.10/site-packages/vllm/worker/model_runner.py", line 735, in execute_model ERROR 07-01 08:12:10 async_llm_engine.py:52] ) = self.prepare_input_tensors(seq_group_metadata_list) ERROR 07-01 08:12:10 async_llm_engine.py:52] File "/opt/conda/lib/python3.10/site-packages/vllm/worker/model_runner.py", line 682, in prepare_input_tensors ERROR 07-01 08:12:10 async_llm_engine.py:52] sampling_metadata = SamplingMetadata.prepare( ERROR 07-01 08:12:10 async_llm_engine.py:52] File "/opt/conda/lib/python3.10/site-packages/vllm/model_executor/sampling_metadata.py", line 116, in prepare ERROR 07-01 08:12:10 async_llm_engine.py:52] ) = _prepare_seq_groups(seq_group_metadata_list, seq_lens, query_lens, ERROR 07-01 08:12:10 async_llm_engine.py:52] File "/opt/conda/lib/python3.10/site-packages/vllm/model_executor/sampling_metadata.py", line 208, in _prepare_seq_groups ERROR 07-01 08:12:10 async_llm_engine.py:52] if sampling_params.seed is not None: ERROR 07-01 08:12:10 async_llm_engine.py:52] AttributeError: 'NoneType' object has no attribute 'seed' Exception in callback functools.partial(<function _log_task_completion at 0x7f40f9a39630>, error_callback=<bound method AsyncLLMEngine._error_callback of <vllm.engine.async_llm_engine.AsyncLLMEngine object at 0x7f40b87a3160>>) handle: <Handle functools.partial(<function _log_task_completion at 0x7f40f9a39630>, error_callback=<bound method AsyncLLMEngine._error_callback of <vllm.engine.async_llm_engine.AsyncLLMEngine object at 0x7f40b87a3160>>)> Traceback (most recent call last): File "/opt/conda/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 42, in _log_task_completion return_value = task.result() File "/opt/conda/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 532, in run_engine_loop has_requests_in_progress = await asyncio.wait_for( File "/opt/conda/lib/python3.10/asyncio/tasks.py", line 445, in wait_for return fut.result() File "/opt/conda/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 506, in engine_step request_outputs = await self.engine.step_async() File "/opt/conda/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 235, in step_async output = await self.model_executor.execute_model_async( File "/opt/conda/lib/python3.10/site-packages/vllm/executor/distributed_gpu_executor.py", line 166, in execute_model_async return await self._driver_execute_model_async(execute_model_req) File "/opt/conda/lib/python3.10/site-packages/vllm/executor/multiproc_gpu_executor.py", line 149, in _driver_execute_model_async return await self.driver_exec_model(execute_model_req) File "/opt/conda/lib/python3.10/concurrent/futures/thread.py", line 58, in run result = self.fn(*self.args, self.kwargs) File "/opt/conda/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, *kwargs) File "/opt/conda/lib/python3.10/site-packages/vllm/worker/worker.py", line 280, in execute_model output = self.model_runner.execute_model(seq_group_metadata_list, File "/opt/conda/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(args, kwargs) File "/opt/conda/lib/python3.10/site-packages/vllm/worker/model_runner.py", line 735, in execute_model ) = self.prepare_input_tensors(seq_group_metadata_list) File "/opt/conda/lib/python3.10/site-packages/vllm/worker/model_runner.py", line 682, in prepare_input_tensors sampling_metadata = SamplingMetadata.prepare( File "/opt/conda/lib/python3.10/site-packages/vllm/model_executor/sampling_metadata.py", line 116, in prepare ) = _prepare_seq_groups(seq_group_metadata_list, seq_lens, query_lens, File "/opt/conda/lib/python3.10/site-packages/vllm/model_executor/sampling_metadata.py", line 208, in _prepare_seq_groups if sampling_params.seed is not None: AttributeError: 'NoneType' object has no attribute 'seed'

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "uvloop/cbhandles.pyx", line 63, in uvloop.loop.Handle._run File "/opt/conda/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 54, in _log_task_completion raise AsyncEngineDeadError( vllm.engine.async_llm_engine.AsyncEngineDeadError: Task finished unexpectedly. This should never happen! Please open an issue on Github. See stack trace above for theactual cause. INFO 07-01 08:12:10 async_llm_engine.py:167] Aborted request cmpl-13a5e1f614ab4afe99ca9ccc99097603-0. INFO: 192.168.30.254:63180 - "POST /v1/embeddings HTTP/1.1" 500 Internal Server Error ERROR: Exception in ASGI application Traceback (most recent call last): File "/opt/conda/lib/python3.10/site-packages/uvicorn/protocols/http/httptools_impl.py", line 399, in run_asgi result = await app( # type: ignore[func-returns-value] File "/opt/conda/lib/python3.10/site-packages/uvicorn/middleware/proxy_headers.py", line 70, in call return await self.app(scope, receive, send) File "/opt/conda/lib/python3.10/site-packages/fastapi/applications.py", line 1054, in call await super().call(scope, receive, send) File "/opt/conda/lib/python3.10/site-packages/starlette/applications.py", line 123, in call await self.middleware_stack(scope, receive, send) File "/opt/conda/lib/python3.10/site-packages/starlette/middleware/errors.py", line 186, in call raise exc File "/opt/conda/lib/python3.10/site-packages/starlette/middleware/errors.py", line 164, in call await self.app(scope, receive, _send) File "/opt/conda/lib/python3.10/site-packages/starlette/middleware/cors.py", line 85, in call await self.app(scope, receive, send) File "/opt/conda/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 65, in call await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send) File "/opt/conda/lib/python3.10/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app raise exc File "/opt/conda/lib/python3.10/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app await app(scope, receive, sender) File "/opt/conda/lib/python3.10/site-packages/starlette/routing.py", line 756, in call await self.middleware_stack(scope, receive, send) File "/opt/conda/lib/python3.10/site-packages/starlette/routing.py", line 776, in app await route.handle(scope, receive, send) File "/opt/conda/lib/python3.10/site-packages/starlette/routing.py", line 297, in handle await self.app(scope, receive, send) File "/opt/conda/lib/python3.10/site-packages/starlette/routing.py", line 77, in app await wrap_app_handling_exceptions(app, request)(scope, receive, send) File "/opt/conda/lib/python3.10/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app raise exc File "/opt/conda/lib/python3.10/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app await app(scope, receive, sender) File "/opt/conda/lib/python3.10/site-packages/starlette/routing.py", line 72, in app response = await func(request) File "/opt/conda/lib/python3.10/site-packages/fastapi/routing.py", line 278, in app raw_response = await run_endpoint_function( File "/opt/conda/lib/python3.10/site-packages/fastapi/routing.py", line 191, in run_endpoint_function return await dependant.call(values) File "/opt/conda/lib/python3.10/site-packages/vllm/entrypoints/openai/api_server.py", line 132, in create_embedding generator = await openai_serving_embedding.create_embedding( File "/opt/conda/lib/python3.10/site-packages/vllm/entrypoints/openai/serving_embedding.py", line 124, in create_embedding async for i, res in result_generator: File "/opt/conda/lib/python3.10/site-packages/vllm/utils.py", line 250, in consumer raise e File "/opt/conda/lib/python3.10/site-packages/vllm/utils.py", line 241, in consumer raise item File "/opt/conda/lib/python3.10/site-packages/vllm/utils.py", line 225, in producer async for item in iterator: File "/opt/conda/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 747, in encode async for output in self._process_request( File "/opt/conda/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 780, in _process_request raise e File "/opt/conda/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 776, in _process_request async for request_output in stream: File "/opt/conda/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 89, in anext raise result File "/opt/conda/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 42, in _log_task_completion return_value = task.result() File "/opt/conda/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 532, in run_engine_loop has_requests_in_progress = await asyncio.wait_for( File "/opt/conda/lib/python3.10/asyncio/tasks.py", line 445, in wait_for return fut.result() File "/opt/conda/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 506, in engine_step request_outputs = await self.engine.step_async() File "/opt/conda/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 235, in step_async output = await self.model_executor.execute_model_async( File "/opt/conda/lib/python3.10/site-packages/vllm/executor/distributed_gpu_executor.py", line 166, in execute_model_async return await self._driver_execute_model_async(execute_model_req) File "/opt/conda/lib/python3.10/site-packages/vllm/executor/multiproc_gpu_executor.py", line 149, in _driver_execute_model_async return await self.driver_exec_model(execute_model_req) File "/opt/conda/lib/python3.10/concurrent/futures/thread.py", line 58, in run result = self.fn(*self.args, *self.kwargs) File "/opt/conda/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(args, kwargs) File "/opt/conda/lib/python3.10/site-packages/vllm/worker/worker.py", line 280, in execute_model output = self.model_runner.execute_model(seq_group_metadata_list, File "/opt/conda/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/opt/conda/lib/python3.10/site-packages/vllm/worker/model_runner.py", line 735, in execute_model ) = self.prepare_input_tensors(seq_group_metadata_list) File "/opt/conda/lib/python3.10/site-packages/vllm/worker/model_runner.py", line 682, in prepare_input_tensors sampling_metadata = SamplingMetadata.prepare( File "/opt/conda/lib/python3.10/site-packages/vllm/model_executor/sampling_metadata.py", line 116, in prepare ) = _prepare_seq_groups(seq_group_metadata_list, seq_lens, query_lens, File "/opt/conda/lib/python3.10/site-packages/vllm/model_executor/sampling_metadata.py", line 208, in _prepare_seq_groups if sampling_params.seed is not None: AttributeError: 'NoneType' object has no attribute 'seed'

youkaichao commented 1 month ago

please provide more details, following the issue template to report your environment, and show how you use vllm.

Junyi-99 commented 1 month ago

same error here.

I triggered this exception by adding an OpenAI-API-compatible embedding model in Dify.


I was using the generation model Llama-3-8b instead of an embedding model.

the problem solved when I switch to an embedding model.

Junyi-99 commented 1 month ago

@2679326161or If you want to use an embedding model, try: https://huggingface.co/intfloat/e5-mistral-7b-instruct.

The models, such as Llama-3-8b, Mistral-7B-Instruct-v0.3, are generation models rather than an embedding model

LJLQ commented 1 month ago

我也遇到了同样的问题,当请求embedding的时候服务就报错了,后面即使是chat请求也无法正常返回

QwertyJack commented 1 week ago

Related to #7502 and fixed by #7504.