Open union-cmd opened 5 months ago
The 404 Not Found error you're encountering when trying to connect to the vllmserver using the llamaindex script is likely because the server does not have a route defined for /generate
or there's a mismatch in the URL you're using to connect. Here's how you can address this issue:
Verify the Server URL and Route: Ensure that the server is running and accessible at the URL you've provided in the V1IServer
initialization. The server must have a /generate
route that accepts POST requests. If the server is running on a different port or if there's a typo in the URL, you would need to correct this in your script.
Check Server Configuration: The server should be configured to handle requests at the /generate
endpoint. This involves checking the server's code or configuration to ensure that it's set up to respond to requests at this path.
Review Script Initialization: The default URL used to initialize the V1IServer
object in your script is "http://localhost:8000"
. If your server is hosted elsewhere or on a different port, you'll need to update this URL accordingly.
Expected Server Response: The server, when correctly set up and called, should return a CompletionResponse
object that includes the input prompt and a response text. If you're not receiving this, it's an indication that either the server isn't running as expected or the request isn't being routed correctly.
Ensure that the server's deployment and configuration match what your script expects, particularly regarding the URL and the endpoint. If the server is correctly running and configured to handle requests at the /generate
endpoint, but you're still facing issues, it might be worth reviewing the server logs for more detailed error messages or misconfigurations.
To continue the conversation, mention @dosu.
The real api url should be "http://localhost:8000/v1/completions". But I got a 400 Bad Request Error
On your client side, look in llms/vllm/utils.py:
def get_response(response: requests.Response) -> List[str]: data = json.loads(response.content) return data["text"]
Add an extra print for debug:
def get_response(response: requests.Response) -> List[str]: data = json.loads(response.content) print("RESPONSE DATA IS: ", data) return data["text"]
The message returned from the server should help debug the problem.
Please post your results.
I think this is related to https://github.com/run-llama/llama_index/issues/12955
Bug Description
I use this command to bring vllmserverr up
and use this llamaindex script to connect it, but I got a 404 error
Version
new
Steps to Reproduce
I use this command to bring vllmserverr up
and use this llamaindex script to connect it, but I got a 404 error
Relevant Logs/Tracbacks
No response