oobabooga / text-generation-webui

A Gradio web UI for Large Language Models.
GNU Affero General Public License v3.0
40k stars 5.25k forks source link

API 404 (Not Found)? #3448

Closed N0THSA closed 1 year ago

N0THSA commented 1 year ago

Describe the bug

Using the API Chat example and Text Generation examples (and correctly configured host/uri endpoints), there is absolutely no output nor generation. Worth noting I am using Runpod for generation.


import requests

# For local streaming, the websockets are hosted without ssl - http://
HOST = 'removed'
URI = f'http://{HOST}/api/v1/chat'

# For reverse-proxied streaming, the remote will likely host with ssl - https://
# URI = 'https://your-uri-here.trycloudflare.com/api/v1/chat'

def run(user_input, history):
    request = {
        'user_input': user_input,
        'max_new_tokens': 250,
        'auto_max_new_tokens': False,
        'history': history,
        'mode': 'chat',  # Valid options: 'chat', 'chat-instruct', 'instruct'
        'character': 'None',
        'instruction_template': 'Vicuna-v1.1',  # Will get autodetected if unset
        'your_name': 'You',
        # 'name1': 'name of user', # Optional
        # 'name2': 'name of character', # Optional
        # 'context': 'character context', # Optional
        # 'greeting': 'greeting', # Optional
        # 'name1_instruct': 'You', # Optional
        # 'name2_instruct': 'Assistant', # Optional
        # 'context_instruct': 'context_instruct', # Optional
        # 'turn_template': 'turn_template', # Optional
        'regenerate': False,
        '_continue': False,
        'stop_at_newline': False,
        'chat_generation_attempts': 1,
        'chat-instruct_command': 'Continue the chat dialogue below. Write a single reply for the character "<|character|>".\n\n<|prompt|>',

        # Generation params. If 'preset' is set to different than 'None', the values
        # in presets/preset-name.yaml are used instead of the individual numbers.
        'preset': 'None',
        'do_sample': True,
        'temperature': 0.7,
        'top_p': 0.1,
        'typical_p': 1,
        'epsilon_cutoff': 0,  # In units of 1e-4
        'eta_cutoff': 0,  # In units of 1e-4
        'tfs': 1,
        'top_a': 0,
        'repetition_penalty': 1.18,
        'repetition_penalty_range': 0,
        'top_k': 40,
        'min_length': 0,
        'no_repeat_ngram_size': 0,
        'num_beams': 1,
        'penalty_alpha': 0,
        'length_penalty': 1,
        'early_stopping': False,
        'mirostat_mode': 0,
        'mirostat_tau': 5,
        'mirostat_eta': 0.1,

        'seed': -1,
        'add_bos_token': True,
        'truncation_length': 2048,
        'ban_eos_token': False,
        'skip_special_tokens': True,
        'stopping_strings': []
    }

    response = requests.post(URI, json=request)

    if response.status_code == 200:
        result = response.json()['results'][0]['history']
        print(json.dumps(result, indent=4))
        print()
        print(result['visible'][-1][1])

if __name__ == '__main__':
    user_input = "Please give me a step-by-step guide on how to plant a tree in my backyard."

    # Basic example
    history = {'internal': [], 'visible': []}

    # "Continue" example. Make sure to set '_continue' to True above
    # arr = [user_input, 'Surely, here is']
    # history = {'internal': [arr], 'visible': [arr]}

    run(user_input, history)

HTTPS is not enabled on the server. Navigating to the endpoint returns a Not Found error. image

Any help is appreciated.

Is there an existing issue for this?

Reproduction

  1. Get the example Chat API Python 3 file
  2. Configure it to point to your endpoint
  3. Try to make a request
  4. Get no response

Screenshot

No response

Logs

0 matches
2023-08-04T04:37:09.884895096-04:00 
2023-08-04T04:37:09.885169480-04:00 ==========
2023-08-04T04:37:09.885192488-04:00 == CUDA ==
2023-08-04T04:37:09.885371405-04:00 ==========
2023-08-04T04:37:09.890025166-04:00 
2023-08-04T04:37:09.892400796-04:00 
2023-08-04T04:37:09.892416713-04:00 Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
2023-08-04T04:37:09.892421072-04:00 
2023-08-04T04:37:09.892423800-04:00 This container image and its contents are governed by the NVIDIA Deep Learning Container License.
2023-08-04T04:37:09.892426403-04:00 By pulling and using the container, you accept the terms and conditions of this license:
2023-08-04T04:37:09.892429052-04:00 https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license
2023-08-04T04:37:09.892431782-04:00 
2023-08-04T04:37:09.892434433-04:00 A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.
2023-08-04T04:37:09.903864299-04:00 
2023-08-04T04:37:09.905810941-04:00 TheBloke's Local LLMs: Pod started
2023-08-04T04:37:09.927989278-04:00  * Starting OpenBSD Secure Shell server sshd
2023-08-04T04:37:09.939610496-04:00    ...done.
2023-08-04T04:37:10.213090764-04:00 Already up to date.
2023-08-04T04:37:10.473857243-04:00 Already up to date.

(logs were removed and aren't recoverable)

System Info

Using Runpod. RTX 3080, 16 VCPUs, 125GB RAM, 12GB VRAM.
Vincent-Stragier commented 1 year ago

Looks like a port issue. Did you specify somehow to redirect the request to the port 5000 on the API?

lanlanji commented 1 year ago

I am also getting 404 even though I modified the "host" variable to ensure it matches the port as part of the webui. Eg, my webui host is localhost:7860, and as part of the example code, I have HOST = 'localhost:7860'. I print out the response.status_code and get 404. If I put port 5000 where HOST = 'localhost:5000' then I get connection refused error.

Vincent-Stragier commented 1 year ago

How are you starting the webui?

You have to explicitly start the API extension (see https://github.com/oobabooga/text-generation-webui/issues/3219#issuecomment-1659134031).

jllllll commented 1 year ago

TheBloke has a runpod template specifically for using the API: TheBloke Local LLMs One-Click UI and API

Vincent-Stragier commented 1 year ago

@jllllll here?

jllllll commented 1 year ago

@jllllll here?

Yeah

N0THSA commented 1 year ago

Looks like a port issue. Did you specify somehow to redirect the request to the port 5000 on the API?

Not a port issue. Confirmed.

I started the WebUI using the -api tag of course, made sure nothing was being blocked, and made sure I can connect to the /api/v1. /api didnt work.

Vincent-Stragier commented 1 year ago

I don't know how Runpod works (I have a server with two RTX4090 at work). Personnaly I use the oneclick installer and run the options under. --listen makes the server accept request from external IP (in my case it is not really needed since after that I use Ngrok to reverse proxy the API endpoint, but it allows me to access the UI, APIs from the local network). You could do something similar to test the API.

webui.py --extension api --loader <the model loader> --model  <the model you want to load> --verbose --listen &
# Add your AuthToken
ngrok config add-authtoken <your_auth_token>
ngrok http --domain=<my-ngrok-domain.ngrok-free.app> 5000

Note: it would be better to use screen and start each service in a screen.

I installed the Python Ngrok client using python3 -m pip install pyngrok (mainly because you don't need a root access going that path) and configured an ngrok account, generated the AuthToken and added it (https://dashboard.ngrok.com/get-started/your-authtoken). For the domain, I do not remember how I generated it but you will find yours at https://dashboard.ngrok.com/cloud-edge/domains. To avoid generating a lot of phishing portals, Ngrok requires to add an header to your request:

    # This is the code I use to do my API request, it needs to be adapted before being
    # used in your test client
    def api_request(self, request: dict) -> requests.Response:
        """Send a request to OobaBooga.

        Args:
            request (dict): the request.

        Returns:
            requests.Response: the response.
        """

        request_params = {
            # url = "http://127.0.0.1:5000/api/v1/generate"
            # or url = "https://dommain.com:443/api/v1/generate"
            "url": self.url,
            "json": request,
            "headers": {"ngrok-skip-browser-warning": "true"},
            "timeout": REQUEST_TIMEOUT,
        }

        # When starting Ngrok you can add basic auth with this flag:
        # --basic-auth 'username:password'
        if self.basic_auth:
            request_params.update(
                auth=HTTPBasicAuth(self.username, self.password)
            )

        return requests.post(**request_params)

That way, you can test the webui API endpoint without configuring any port forwarding. If you try to open the Ngrok URL, you will get an error 404:

image

And you will not be able to see it but the server will receive the requests (here I started the webui on my laptop, under Windows, but it's the same behaviour on Linux):

image

N0THSA commented 1 year ago

I currently do not have any Runpod tokens, but I will buy some as soon as possible to test this. Honestly, I think it might be because I forgot the "--listen" parameter, and I'm trying to connect from an external machine.

nutheory commented 1 year ago

I tried it with the listen parameter, and many other variations, i think it might just be the docker version since that what im using and im pretty sure thats what runpod uses.

tjb4578 commented 1 year ago

Running the api on localhost.

I get a response for the generate endpoint:

http://localhost:5000/api/v1/generate

{
    "prompt": "Hey can you hear me?",
    "max_new_tokens": "64",
    "auto_max_new_tokens": "False",
    "history": {
        "internal": [],
        "visible": []
    },
    "mode": "instruct",
    "character": "Example",
    "instruction_template": "Vicuna-v1.1",
    "your_name": "You",
    "regenerate": "False",
    "_continue": "False",
    "stop_at_newline": "False",
    "chat_generation_attempts": 1,
    "chat-instruct_command": "Continue the chat dialogue below. Write a single reply for the character '<|character|>'.\n\n<|prompt|>",
    "preset": "None",
    "do_sample": "True",
    "temperature": 0.7,
    "top_p": 0.1,
    "typical_p": 1,
    "epsilon_cutoff": 0, 
    "eta_cutoff": 0,  
    "tfs": 1,
    "top_a": 0,
    "repetition_penalty": 1.18,
    "repetition_penalty_range": 0,
    "top_k": 40,
    "min_length": 0,
    "no_repeat_ngram_size": 0,
    "num_beams": 1,
    "penalty_alpha": 0,
    "length_penalty": 1,
    "early_stopping": "False",
    "mirostat_mode": 0,
    "mirostat_tau": 5,
    "mirostat_eta": 0.1,
    "seed": -1,
    "add_bos_token": "True",
    "truncation_length": 2048,
    "ban_eos_token": "False",
    "skip_special_tokens": "True",
    "stopping_strings": []
}

This yields:

{
    "results": [
        {
            "text": "\nI'm in a quiet room with no background noise. I want to record myself speaking, but without any background noise interfering with the audio quality. Is there anyway for me to do this on my own computer or would it be better off doing it at a professional recording studio? Also, how can i"
        }
    ]
}

When I try the chat endpoint (also changing prompt to user_input), my response comes back instantaneously and is empty.

{
    "results": [
        {
            "history": {
                "internal": [],
                "visible": []
            }
        }
    ]
}

Any ideas why the chat endpoint isn't generating anything?

nutheory commented 1 year ago

I have a manually installed ooba version on localhost (M2 Macbookpro) that works perfectly fine, Its my docker install on my lambdalabs server thats broken... both were recently updated.

jllllll commented 1 year ago

@tjb4578 Don't put quotes around True or False.

Vincent-Stragier commented 1 year ago

Hi @tjb4578,

Personally, I use exclusively generate, since I handle the “prompt” and the history myself. Though, be careful with the parameters you are using, for example the parameter character will use the Example character which will impact the generation (since it's added to the prompt I believe in chat mode).

tjb4578 commented 1 year ago

@tjb4578 Don't put quotes around True or False.

Thanks this was my issue!

N0THSA commented 1 year ago

I have a manually installed ooba version on localhost (M2 Macbookpro) that works perfectly fine, Its my docker install on my lambdalabs server thats broken... both were recently updated.

I've seen multiple people with the same issue (or just testing with my setup) and all of the broken ones are on Docker in particular, no matter the actual container image... weird.

github-actions[bot] commented 1 year ago

This issue has been closed due to inactivity for 6 weeks. If you believe it is still relevant, please leave a comment below. You can tag a developer in your comment.