[Bug]: "POST /generate HTTP/1.1" 404 Not Found

union-cmd commented 5 months ago

Bug Description

I use this command to bring vllmserverr up

python -m vllm.entrypoints.openai.api_server --model /hy-tmp/hub/models--Phind--Phind-CodeLlama-34B-v2/snapshots/949f61e203f91b412efe8f679c798f09f0ff4b0c --tokenizer /hy-tmp/hub/models--Phind--Phind-CodeLlama-34B-v2/snapshots/949f61e203f91b412efe8f679c798f09f0ff4b0c --gpu-memory-utilization 0.9

and use this llamaindex script to connect it, but I got a 404 error

Version

new

Steps to Reproduce

I use this command to bring vllmserverr up

python -m vllm.entrypoints.openai.api_server --model /hy-tmp/hub/models--Phind--Phind-CodeLlama-34B-v2/snapshots/949f61e203f91b412efe8f679c798f09f0ff4b0c --tokenizer /hy-tmp/hub/models--Phind--Phind-CodeLlama-34B-v2/snapshots/949f61e203f91b412efe8f679c798f09f0ff4b0c --gpu-memory-utilization 0.9

and use this llamaindex script to connect it, but I got a 404 error

Relevant Logs/Tracbacks

No response

dosubot[bot] commented 5 months ago

The 404 Not Found error you're encountering when trying to connect to the vllmserver using the llamaindex script is likely because the server does not have a route defined for /generate or there's a mismatch in the URL you're using to connect. Here's how you can address this issue:

Verify the Server URL and Route: Ensure that the server is running and accessible at the URL you've provided in the V1IServer initialization. The server must have a /generate route that accepts POST requests. If the server is running on a different port or if there's a typo in the URL, you would need to correct this in your script.
Check Server Configuration: The server should be configured to handle requests at the /generate endpoint. This involves checking the server's code or configuration to ensure that it's set up to respond to requests at this path.
Review Script Initialization: The default URL used to initialize the V1IServer object in your script is "http://localhost:8000". If your server is hosted elsewhere or on a different port, you'll need to update this URL accordingly.
Expected Server Response: The server, when correctly set up and called, should return a CompletionResponse object that includes the input prompt and a response text. If you're not receiving this, it's an indication that either the server isn't running as expected or the request isn't being routed correctly.

Ensure that the server's deployment and configuration match what your script expects, particularly regarding the URL and the endpoint. If the server is correctly running and configured to handle requests at the /generate endpoint, but you're still facing issues, it might be worth reviewing the server logs for more detailed error messages or misconfigurations.

#### Details

**Sources** - [llama-index-integrations/llms/llama-index-llms-vllm/llama_index/llms/vllm/base.py](https://github.com/run-llama/llama_index/blob/main/llama-index-integrations/llms/llama-index-llms-vllm/llama_index/llms/vllm/base.py) - [llama-index-integrations/llms/llama-index-llms-vllm/tests/test_integration.py](https://github.com/run-llama/llama_index/blob/main/llama-index-integrations/llms/llama-index-llms-vllm/tests/test_integration.py) - [llama-index-integrations/llms/llama-index-llms-vllm/pyproject.toml](https://github.com/run-llama/llama_index/blob/main/llama-index-integrations/llms/llama-index-llms-vllm/pyproject.toml) - [llama-index-legacy/llama_index/legacy/llms/vllm.py](https://github.com/run-llama/llama_index/blob/main/llama-index-legacy/llama_index/legacy/llms/vllm.py)
**About Dosu** This response is meant to be useful and save you time. It is **not meant to be a precise solution**, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

To continue the conversation, mention @dosu.

buaaflyaway commented 3 months ago

The real api url should be "http://localhost:8000/v1/completions". But I got a 400 Bad Request Error

doscherda commented 1 month ago

On your client side, look in llms/vllm/utils.py:

def get_response(response: requests.Response) -> List[str]: data = json.loads(response.content) return data["text"]

Add an extra print for debug:

def get_response(response: requests.Response) -> List[str]: data = json.loads(response.content) print("RESPONSE DATA IS: ", data) return data["text"]

The message returned from the server should help debug the problem.

Please post your results.

doscherda commented 1 month ago

I think this is related to https://github.com/run-llama/llama_index/issues/12955

run-llama / llama_index