vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
30.18k stars 4.56k forks source link

embedings error python -m vllm.entrypoints.openai.api_server --trust-remote-code --model gte_Qwen2-7B-instruct --seed 48 --max-model-len 1000 --tensor-parallel-size 2 --gpu-memory-utilization 1 --dtype float16 #6015

Open 2679326161or opened 4 months ago

2679326161or commented 4 months ago

Your current environment

The output of `python collect_env.py`

🐛 Describe the bug

ERROR 07-01 08:12:10 async_llm_engine.py:52] Engine background task failed ERROR 07-01 08:12:10 async_llm_engine.py:52] Traceback (most recent call last): ERROR 07-01 08:12:10 async_llm_engine.py:52] File "/opt/conda/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 42, in _log_task_completion ERROR 07-01 08:12:10 async_llm_engine.py:52] return_value = task.result() ERROR 07-01 08:12:10 async_llm_engine.py:52] File "/opt/conda/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 532, in run_engine_loop ERROR 07-01 08:12:10 async_llm_engine.py:52] has_requests_in_progress = await asyncio.wait_for( ERROR 07-01 08:12:10 async_llm_engine.py:52] File "/opt/conda/lib/python3.10/asyncio/tasks.py", line 445, in wait_for ERROR 07-01 08:12:10 async_llm_engine.py:52] return fut.result() ERROR 07-01 08:12:10 async_llm_engine.py:52] File "/opt/conda/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 506, in engine_step ERROR 07-01 08:12:10 async_llm_engine.py:52] request_outputs = await self.engine.step_async() ERROR 07-01 08:12:10 async_llm_engine.py:52] File "/opt/conda/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 235, in step_async ERROR 07-01 08:12:10 async_llm_engine.py:52] output = await self.model_executor.execute_model_async( ERROR 07-01 08:12:10 async_llm_engine.py:52] File "/opt/conda/lib/python3.10/site-packages/vllm/executor/distributed_gpu_executor.py", line 166, in execute_model_async ERROR 07-01 08:12:10 async_llm_engine.py:52] return await self._driver_execute_model_async(execute_model_req) ERROR 07-01 08:12:10 async_llm_engine.py:52] File "/opt/conda/lib/python3.10/site-packages/vllm/executor/multiproc_gpu_executor.py", line 149, in _driver_execute_model_async ERROR 07-01 08:12:10 async_llm_engine.py:52] return await self.driver_exec_model(execute_model_req) ERROR 07-01 08:12:10 async_llm_engine.py:52] File "/opt/conda/lib/python3.10/concurrent/futures/thread.py", line 58, in run ERROR 07-01 08:12:10 async_llm_engine.py:52] result = self.fn(*self.args, self.kwargs) ERROR 07-01 08:12:10 async_llm_engine.py:52] File "/opt/conda/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context ERROR 07-01 08:12:10 async_llm_engine.py:52] return func(*args, *kwargs) ERROR 07-01 08:12:10 async_llm_engine.py:52] File "/opt/conda/lib/python3.10/site-packages/vllm/worker/worker.py", line 280, in execute_model ERROR 07-01 08:12:10 async_llm_engine.py:52] output = self.model_runner.execute_model(seq_group_metadata_list, ERROR 07-01 08:12:10 async_llm_engine.py:52] File "/opt/conda/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context ERROR 07-01 08:12:10 async_llm_engine.py:52] return func(args, kwargs) ERROR 07-01 08:12:10 async_llm_engine.py:52] File "/opt/conda/lib/python3.10/site-packages/vllm/worker/model_runner.py", line 735, in execute_model ERROR 07-01 08:12:10 async_llm_engine.py:52] ) = self.prepare_input_tensors(seq_group_metadata_list) ERROR 07-01 08:12:10 async_llm_engine.py:52] File "/opt/conda/lib/python3.10/site-packages/vllm/worker/model_runner.py", line 682, in prepare_input_tensors ERROR 07-01 08:12:10 async_llm_engine.py:52] sampling_metadata = SamplingMetadata.prepare( ERROR 07-01 08:12:10 async_llm_engine.py:52] File "/opt/conda/lib/python3.10/site-packages/vllm/model_executor/sampling_metadata.py", line 116, in prepare ERROR 07-01 08:12:10 async_llm_engine.py:52] ) = _prepare_seq_groups(seq_group_metadata_list, seq_lens, query_lens, ERROR 07-01 08:12:10 async_llm_engine.py:52] File "/opt/conda/lib/python3.10/site-packages/vllm/model_executor/sampling_metadata.py", line 208, in _prepare_seq_groups ERROR 07-01 08:12:10 async_llm_engine.py:52] if sampling_params.seed is not None: ERROR 07-01 08:12:10 async_llm_engine.py:52] AttributeError: 'NoneType' object has no attribute 'seed' Exception in callback functools.partial(<function _log_task_completion at 0x7f40f9a39630>, error_callback=<bound method AsyncLLMEngine._error_callback of <vllm.engine.async_llm_engine.AsyncLLMEngine object at 0x7f40b87a3160>>) handle: <Handle functools.partial(<function _log_task_completion at 0x7f40f9a39630>, error_callback=<bound method AsyncLLMEngine._error_callback of <vllm.engine.async_llm_engine.AsyncLLMEngine object at 0x7f40b87a3160>>)> Traceback (most recent call last): File "/opt/conda/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 42, in _log_task_completion return_value = task.result() File "/opt/conda/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 532, in run_engine_loop has_requests_in_progress = await asyncio.wait_for( File "/opt/conda/lib/python3.10/asyncio/tasks.py", line 445, in wait_for return fut.result() File "/opt/conda/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 506, in engine_step request_outputs = await self.engine.step_async() File "/opt/conda/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 235, in step_async output = await self.model_executor.execute_model_async( File "/opt/conda/lib/python3.10/site-packages/vllm/executor/distributed_gpu_executor.py", line 166, in execute_model_async return await self._driver_execute_model_async(execute_model_req) File "/opt/conda/lib/python3.10/site-packages/vllm/executor/multiproc_gpu_executor.py", line 149, in _driver_execute_model_async return await self.driver_exec_model(execute_model_req) File "/opt/conda/lib/python3.10/concurrent/futures/thread.py", line 58, in run result = self.fn(*self.args, self.kwargs) File "/opt/conda/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, *kwargs) File "/opt/conda/lib/python3.10/site-packages/vllm/worker/worker.py", line 280, in execute_model output = self.model_runner.execute_model(seq_group_metadata_list, File "/opt/conda/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(args, kwargs) File "/opt/conda/lib/python3.10/site-packages/vllm/worker/model_runner.py", line 735, in execute_model ) = self.prepare_input_tensors(seq_group_metadata_list) File "/opt/conda/lib/python3.10/site-packages/vllm/worker/model_runner.py", line 682, in prepare_input_tensors sampling_metadata = SamplingMetadata.prepare( File "/opt/conda/lib/python3.10/site-packages/vllm/model_executor/sampling_metadata.py", line 116, in prepare ) = _prepare_seq_groups(seq_group_metadata_list, seq_lens, query_lens, File "/opt/conda/lib/python3.10/site-packages/vllm/model_executor/sampling_metadata.py", line 208, in _prepare_seq_groups if sampling_params.seed is not None: AttributeError: 'NoneType' object has no attribute 'seed'

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "uvloop/cbhandles.pyx", line 63, in uvloop.loop.Handle._run File "/opt/conda/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 54, in _log_task_completion raise AsyncEngineDeadError( vllm.engine.async_llm_engine.AsyncEngineDeadError: Task finished unexpectedly. This should never happen! Please open an issue on Github. See stack trace above for theactual cause. INFO 07-01 08:12:10 async_llm_engine.py:167] Aborted request cmpl-13a5e1f614ab4afe99ca9ccc99097603-0. INFO: 192.168.30.254:63180 - "POST /v1/embeddings HTTP/1.1" 500 Internal Server Error ERROR: Exception in ASGI application Traceback (most recent call last): File "/opt/conda/lib/python3.10/site-packages/uvicorn/protocols/http/httptools_impl.py", line 399, in run_asgi result = await app( # type: ignore[func-returns-value] File "/opt/conda/lib/python3.10/site-packages/uvicorn/middleware/proxy_headers.py", line 70, in call return await self.app(scope, receive, send) File "/opt/conda/lib/python3.10/site-packages/fastapi/applications.py", line 1054, in call await super().call(scope, receive, send) File "/opt/conda/lib/python3.10/site-packages/starlette/applications.py", line 123, in call await self.middleware_stack(scope, receive, send) File "/opt/conda/lib/python3.10/site-packages/starlette/middleware/errors.py", line 186, in call raise exc File "/opt/conda/lib/python3.10/site-packages/starlette/middleware/errors.py", line 164, in call await self.app(scope, receive, _send) File "/opt/conda/lib/python3.10/site-packages/starlette/middleware/cors.py", line 85, in call await self.app(scope, receive, send) File "/opt/conda/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 65, in call await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send) File "/opt/conda/lib/python3.10/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app raise exc File "/opt/conda/lib/python3.10/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app await app(scope, receive, sender) File "/opt/conda/lib/python3.10/site-packages/starlette/routing.py", line 756, in call await self.middleware_stack(scope, receive, send) File "/opt/conda/lib/python3.10/site-packages/starlette/routing.py", line 776, in app await route.handle(scope, receive, send) File "/opt/conda/lib/python3.10/site-packages/starlette/routing.py", line 297, in handle await self.app(scope, receive, send) File "/opt/conda/lib/python3.10/site-packages/starlette/routing.py", line 77, in app await wrap_app_handling_exceptions(app, request)(scope, receive, send) File "/opt/conda/lib/python3.10/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app raise exc File "/opt/conda/lib/python3.10/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app await app(scope, receive, sender) File "/opt/conda/lib/python3.10/site-packages/starlette/routing.py", line 72, in app response = await func(request) File "/opt/conda/lib/python3.10/site-packages/fastapi/routing.py", line 278, in app raw_response = await run_endpoint_function( File "/opt/conda/lib/python3.10/site-packages/fastapi/routing.py", line 191, in run_endpoint_function return await dependant.call(values) File "/opt/conda/lib/python3.10/site-packages/vllm/entrypoints/openai/api_server.py", line 132, in create_embedding generator = await openai_serving_embedding.create_embedding( File "/opt/conda/lib/python3.10/site-packages/vllm/entrypoints/openai/serving_embedding.py", line 124, in create_embedding async for i, res in result_generator: File "/opt/conda/lib/python3.10/site-packages/vllm/utils.py", line 250, in consumer raise e File "/opt/conda/lib/python3.10/site-packages/vllm/utils.py", line 241, in consumer raise item File "/opt/conda/lib/python3.10/site-packages/vllm/utils.py", line 225, in producer async for item in iterator: File "/opt/conda/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 747, in encode async for output in self._process_request( File "/opt/conda/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 780, in _process_request raise e File "/opt/conda/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 776, in _process_request async for request_output in stream: File "/opt/conda/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 89, in anext raise result File "/opt/conda/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 42, in _log_task_completion return_value = task.result() File "/opt/conda/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 532, in run_engine_loop has_requests_in_progress = await asyncio.wait_for( File "/opt/conda/lib/python3.10/asyncio/tasks.py", line 445, in wait_for return fut.result() File "/opt/conda/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 506, in engine_step request_outputs = await self.engine.step_async() File "/opt/conda/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 235, in step_async output = await self.model_executor.execute_model_async( File "/opt/conda/lib/python3.10/site-packages/vllm/executor/distributed_gpu_executor.py", line 166, in execute_model_async return await self._driver_execute_model_async(execute_model_req) File "/opt/conda/lib/python3.10/site-packages/vllm/executor/multiproc_gpu_executor.py", line 149, in _driver_execute_model_async return await self.driver_exec_model(execute_model_req) File "/opt/conda/lib/python3.10/concurrent/futures/thread.py", line 58, in run result = self.fn(*self.args, *self.kwargs) File "/opt/conda/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(args, kwargs) File "/opt/conda/lib/python3.10/site-packages/vllm/worker/worker.py", line 280, in execute_model output = self.model_runner.execute_model(seq_group_metadata_list, File "/opt/conda/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/opt/conda/lib/python3.10/site-packages/vllm/worker/model_runner.py", line 735, in execute_model ) = self.prepare_input_tensors(seq_group_metadata_list) File "/opt/conda/lib/python3.10/site-packages/vllm/worker/model_runner.py", line 682, in prepare_input_tensors sampling_metadata = SamplingMetadata.prepare( File "/opt/conda/lib/python3.10/site-packages/vllm/model_executor/sampling_metadata.py", line 116, in prepare ) = _prepare_seq_groups(seq_group_metadata_list, seq_lens, query_lens, File "/opt/conda/lib/python3.10/site-packages/vllm/model_executor/sampling_metadata.py", line 208, in _prepare_seq_groups if sampling_params.seed is not None: AttributeError: 'NoneType' object has no attribute 'seed'

youkaichao commented 4 months ago

please provide more details, following the issue template to report your environment, and show how you use vllm.

Junyi-99 commented 4 months ago

same error here.

I triggered this exception by adding an OpenAI-API-compatible embedding model in Dify.


I was using the generation model Llama-3-8b instead of an embedding model.

the problem solved when I switch to an embedding model.

Junyi-99 commented 4 months ago

@2679326161or If you want to use an embedding model, try: https://huggingface.co/intfloat/e5-mistral-7b-instruct.

The models, such as Llama-3-8b, Mistral-7B-Instruct-v0.3, are generation models rather than an embedding model

LJLQ commented 4 months ago

我也遇到了同样的问题,当请求embedding的时候服务就报错了,后面即使是chat请求也无法正常返回

QwertyJack commented 3 months ago

Related to #7502 and fixed by #7504.

github-actions[bot] commented 23 hours ago

This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you!