vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
30.06k stars 4.54k forks source link

[Bug]: TypeError: inputs must be a string, TextPrompt, or TokensPrompt #9050

Open johnathanchiu opened 1 month ago

johnathanchiu commented 1 month ago

Your current environment

I am using torchserve to spin up the vLLM instance (https://github.com/pytorch/serve?tab=readme-ov-file#-quick-start-llm-deployment-with-docker).

Model Input Dumps

No response

🐛 Describe the bug

Here's the stacktrace I see:

2024-10-03T17:37:44,150 [INFO ] W-9000-model_1.0-stdout MODEL_LOG -     result = task.result()
2024-10-03T17:37:44,150 [INFO ] W-9000-model_1.0-stdout MODEL_LOG -   File "/home/venv/lib/python3.9/site-packages/vllm/engine/async_llm_engine.py", line 661, in engine_step
2024-10-03T17:37:44,150 [INFO ] W-9000-model_1.0-stdout MODEL_LOG -     await self.engine.add_request_async(**new_request)
2024-10-03T17:37:44,150 [INFO ] W-9000-model_1.0-stdout MODEL_LOG -   File "/home/venv/lib/python3.9/site-packages/vllm/engine/async_llm_engine.py", line 419, in add_request_async
2024-10-03T17:37:44,150 [INFO ] W-9000-model_1.0-stdout MODEL_LOG -     preprocessed_inputs = await self.input_preprocessor.preprocess_async(
2024-10-03T17:37:44,150 [INFO ] W-9000-model_1.0-stdout MODEL_LOG -   File "/home/venv/lib/python3.9/site-packages/vllm/inputs/preprocess.py", line 528, in preprocess_async
2024-10-03T17:37:44,150 [INFO ] W-9000-model_1.0-stdout MODEL_LOG -     return await self._process_decoder_only_prompt_async(
2024-10-03T17:37:44,150 [INFO ] W-9000-model_1.0-stdout MODEL_LOG -   File "/home/venv/lib/python3.9/site-packages/vllm/inputs/preprocess.py", line 468, in _process_decoder_only_prompt_async
2024-10-03T17:37:44,150 [INFO ] W-9000-model_1.0-stdout MODEL_LOG -     prompt_comps = await self._extract_prompt_components_async(
2024-10-03T17:37:44,150 [INFO ] W-9000-model_1.0-stdout MODEL_LOG -   File "/home/venv/lib/python3.9/site-packages/vllm/inputs/preprocess.py", line 263, in _extract_prompt_components_async
2024-10-03T17:37:44,150 [INFO ] W-9000-model_1.0-stdout MODEL_LOG -     parsed = parse_singleton_prompt(inputs)
2024-10-03T17:37:44,150 [INFO ] W-9000-model_1.0-stdout MODEL_LOG -   File "/home/venv/lib/python3.9/site-packages/vllm/inputs/parse.py", line 95, in parse_singleton_prompt
2024-10-03T17:37:44,150 [INFO ] W-9000-model_1.0-stdout MODEL_LOG -     raise TypeError("inputs must be a string, TextPrompt, or TokensPrompt")
2024-10-03T17:37:44,150 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - TypeError: inputs must be a string, TextPrompt, or TokensPrompt
2024-10-03T17:37:44,150 [INFO ] W-9000-model_1.0-stdout MODEL_LOG -
2024-10-03T17:37:44,150 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - The above exception was the direct cause of the following exception:
2024-10-03T17:37:44,150 [INFO ] W-9000-model_1.0-stdout MODEL_LOG -
2024-10-03T17:37:44,151 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - Traceback (most recent call last):
2024-10-03T17:37:44,151 [INFO ] W-9000-model_1.0-stdout MODEL_LOG -   File "/usr/lib/python3.9/asyncio/events.py", line 80, in _run
2024-10-03T17:37:44,151 [INFO ] W-9000-model_1.0-stdout MODEL_LOG -     self._context.run(self._callback, *self._args)
2024-10-03T17:37:44,151 [INFO ] W-9000-model_1.0-stdout MODEL_LOG -   File "/home/venv/lib/python3.9/site-packages/vllm/engine/async_llm_engine.py", line 60, in _log_task_completion
2024-10-03T17:37:44,151 [INFO ] W-9000-model_1.0-stdout MODEL_LOG -     raise AsyncEngineDeadError(
2024-10-03T17:37:44,151 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - vllm.engine.async_llm_engine.AsyncEngineDeadError: Task finished unexpectedly. This should never happen! Please open an issue on Github. See stack trace above for the actual cause.

On the client side I am running this script:

import base64
import os
from openai import OpenAI

model_name = "meta-llama/Meta-Llama-3.1-70B-Instruct"
openai_api_key = "EMPTY"
openai_api_base = (
    f"http://localhost:8080/predictions/model/1.0/v1/completions"
)

client = OpenAI(
    api_key=openai_api_key,
    base_url=openai_api_base,
)

response = client.chat.completions.create(
    model=model_name,
    messages=[
        {"role": "user", "content": "Classify this sentiment: vLLM is wonderful!"}
    ],
)

Before submitting a new issue...

DarkLight1337 commented 1 month ago

I am using torchserve to spin up the vLLM instance (https://github.com/pytorch/serve?tab=readme-ov-file#-quick-start-llm-deployment-with-docker).

The recommended way to serve vLLM models it to use the vllm serve CLI. I don't think we directly support external ways of serving vLLM out-of-the-box (it's up to the external library maintainers to fix compatibility issues). You should open an issue in torchserve repo instead.