Closed N0THSA closed 1 year ago
Looks like a port issue. Did you specify somehow to redirect the request to the port 5000 on the API?
I am also getting 404 even though I modified the "host" variable to ensure it matches the port as part of the webui. Eg, my webui host is localhost:7860, and as part of the example code, I have HOST = 'localhost:7860'. I print out the response.status_code and get 404. If I put port 5000 where HOST = 'localhost:5000' then I get connection refused error.
How are you starting the webui?
You have to explicitly start the API extension (see https://github.com/oobabooga/text-generation-webui/issues/3219#issuecomment-1659134031).
TheBloke has a runpod template specifically for using the API: TheBloke Local LLMs One-Click UI and API
@jllllll here?
Looks like a port issue. Did you specify somehow to redirect the request to the port 5000 on the API?
Not a port issue. Confirmed.
I started the WebUI using the -api tag of course, made sure nothing was being blocked, and made sure I can connect to the /api/v1. /api didnt work.
I don't know how Runpod works (I have a server with two RTX4090 at work). Personnaly I use the oneclick installer and run the options under. --listen
makes the server accept request from external IP (in my case it is not really needed since after that I use Ngrok to reverse proxy the API endpoint, but it allows me to access the UI, APIs from the local network). You could do something similar to test the API.
webui.py --extension api --loader <the model loader> --model <the model you want to load> --verbose --listen &
# Add your AuthToken
ngrok config add-authtoken <your_auth_token>
ngrok http --domain=<my-ngrok-domain.ngrok-free.app> 5000
Note: it would be better to use screen
and start each service in a screen.
I installed the Python Ngrok client using python3 -m pip install pyngrok
(mainly because you don't need a root access going that path) and configured an ngrok account, generated the AuthToken and added it (https://dashboard.ngrok.com/get-started/your-authtoken). For the domain, I do not remember how I generated it but you will find yours at https://dashboard.ngrok.com/cloud-edge/domains. To avoid generating a lot of phishing portals, Ngrok requires to add an header to your request:
# This is the code I use to do my API request, it needs to be adapted before being
# used in your test client
def api_request(self, request: dict) -> requests.Response:
"""Send a request to OobaBooga.
Args:
request (dict): the request.
Returns:
requests.Response: the response.
"""
request_params = {
# url = "http://127.0.0.1:5000/api/v1/generate"
# or url = "https://dommain.com:443/api/v1/generate"
"url": self.url,
"json": request,
"headers": {"ngrok-skip-browser-warning": "true"},
"timeout": REQUEST_TIMEOUT,
}
# When starting Ngrok you can add basic auth with this flag:
# --basic-auth 'username:password'
if self.basic_auth:
request_params.update(
auth=HTTPBasicAuth(self.username, self.password)
)
return requests.post(**request_params)
That way, you can test the webui API endpoint without configuring any port forwarding. If you try to open the Ngrok URL, you will get an error 404:
And you will not be able to see it but the server will receive the requests (here I started the webui on my laptop, under Windows, but it's the same behaviour on Linux):
I currently do not have any Runpod tokens, but I will buy some as soon as possible to test this. Honestly, I think it might be because I forgot the "--listen" parameter, and I'm trying to connect from an external machine.
I tried it with the listen parameter, and many other variations, i think it might just be the docker version since that what im using and im pretty sure thats what runpod uses.
Running the api on localhost.
I get a response for the generate
endpoint:
http://localhost:5000/api/v1/generate
{
"prompt": "Hey can you hear me?",
"max_new_tokens": "64",
"auto_max_new_tokens": "False",
"history": {
"internal": [],
"visible": []
},
"mode": "instruct",
"character": "Example",
"instruction_template": "Vicuna-v1.1",
"your_name": "You",
"regenerate": "False",
"_continue": "False",
"stop_at_newline": "False",
"chat_generation_attempts": 1,
"chat-instruct_command": "Continue the chat dialogue below. Write a single reply for the character '<|character|>'.\n\n<|prompt|>",
"preset": "None",
"do_sample": "True",
"temperature": 0.7,
"top_p": 0.1,
"typical_p": 1,
"epsilon_cutoff": 0,
"eta_cutoff": 0,
"tfs": 1,
"top_a": 0,
"repetition_penalty": 1.18,
"repetition_penalty_range": 0,
"top_k": 40,
"min_length": 0,
"no_repeat_ngram_size": 0,
"num_beams": 1,
"penalty_alpha": 0,
"length_penalty": 1,
"early_stopping": "False",
"mirostat_mode": 0,
"mirostat_tau": 5,
"mirostat_eta": 0.1,
"seed": -1,
"add_bos_token": "True",
"truncation_length": 2048,
"ban_eos_token": "False",
"skip_special_tokens": "True",
"stopping_strings": []
}
This yields:
{
"results": [
{
"text": "\nI'm in a quiet room with no background noise. I want to record myself speaking, but without any background noise interfering with the audio quality. Is there anyway for me to do this on my own computer or would it be better off doing it at a professional recording studio? Also, how can i"
}
]
}
When I try the chat
endpoint (also changing prompt
to user_input
), my response comes back instantaneously and is empty.
{
"results": [
{
"history": {
"internal": [],
"visible": []
}
}
]
}
Any ideas why the chat
endpoint isn't generating anything?
I have a manually installed ooba version on localhost (M2 Macbookpro) that works perfectly fine, Its my docker install on my lambdalabs server thats broken... both were recently updated.
@tjb4578 Don't put quotes around True or False.
Hi @tjb4578,
Personally, I use exclusively generate, since I handle the “prompt” and the history myself. Though, be careful with the parameters you are using, for example the parameter character
will use the Example
character which will impact the generation (since it's added to the prompt I believe in chat mode).
@tjb4578 Don't put quotes around True or False.
Thanks this was my issue!
I have a manually installed ooba version on localhost (M2 Macbookpro) that works perfectly fine, Its my docker install on my lambdalabs server thats broken... both were recently updated.
I've seen multiple people with the same issue (or just testing with my setup) and all of the broken ones are on Docker in particular, no matter the actual container image... weird.
This issue has been closed due to inactivity for 6 weeks. If you believe it is still relevant, please leave a comment below. You can tag a developer in your comment.
Describe the bug
Using the API Chat example and Text Generation examples (and correctly configured host/uri endpoints), there is absolutely no output nor generation. Worth noting I am using Runpod for generation.
HTTPS is not enabled on the server. Navigating to the endpoint returns a Not Found error.
Any help is appreciated.
Is there an existing issue for this?
Reproduction
Screenshot
No response
Logs
System Info