Difficulties with chat-ui promp to text-generation-webui openai api endpoint #1277

Closed Monviech closed 1 week ago

Monviech commented 2 weeks ago


I'm trying my best to get the huggingface chat-ui working with the API endpoint of text-generation-webui.

I would be really happy if I could get a hint what I am doing wrong.

Here is a reverse proxied test instance:

I can't get my prompt that I input into the chat-ui to pass to the text-generation-webui. Every prompt will be ignored and a random answer is returned.

Here is the command I start text-generation-webui:

```./ --listen --listen-port 8000 --api --api-port 8001 --verbose --model NTQAI_Nxcode-CQ-7B-orpo```

Here is my current .local.env of the chat-ui and the command I run it with:

```npm run dev -- --host``` ``` MODELS=`[ { "name": "text-generation-webui", "id": "text-generation-webui", "parameters": { "temperature": 0.9, "top_p": 0.95, "max_new_tokens": 1024, "stop": [] }, "endpoints": [{ "type" : "openai", "baseURL": "", "extraBody": { "repetition_penalty": 1.2, "top_k": 50, "truncate": 1000 } }] } ]` MONGODB_URL=`mongodb://localhost:27017` DEBUG=`true` ```

Here are the logs what happen when I write a prompt:


``` 15:58:23-000843 INFO Starting Text generation web UI
15:58:23-003506 WARNING You are potentially exposing the web UI to the entire internet without any access password.
15:58:23-008582 INFO Loading "NTQAI_Nxcode-CQ-7B-orpo"
15:58:30-445446 INFO Loaded "NTQAI_Nxcode-CQ-7B-orpo" in 7.44 seconds.
15:58:30-446274 INFO LOADER: "Transformers"
15:58:30-446723 INFO TRUNCATION LENGTH: 65536
15:58:30-447183 INFO INSTRUCTION TEMPLATE: "Custom (obtained from model metadata)"
15:58:30-447709 INFO Loading the extension "openai"
15:58:30-510985 INFO OpenAI-compatible API URL:
15:59:21-782210 INFO GENERATE_PARAMS= { 'max_new_tokens': 15, 'temperature': 0.9, 'top_p': 0.95, 'top_k': 50, 'repetition_penalty': 1.2 }
15:59:21-783364 INFO PROMPT= <|im_start|>system You are a summarization AI. Summarize the user's request into a single short sentence of four words or less. Do not try to answer it, only summarize the user's query. Always start your answer with an emoji relevant to the summary<|im_end|>
<|im_start|>assistant 🇬🇦 President of Gabon<|im_end|>
<|im_start|>assistant 🧑 Julien Chaumond<|im_end|>
<|im_start|>assistant 🔢 Simple math operation<|im_end|>
<|im_start|>assistant 📰 Latest news<|im_end|>
<|im_start|>assistant 🍰 Cheesecake recipe<|im_end|>
<|im_start|>assistant 🎥 Favorite movie<|im_end|>
<|im_start|>assistant 🤖 AI definition<|im_end|>
<|im_start|>assistant 🐱 Cute cat drawing<|im_end|>
<|im_start|>assistant
15:59:22-068257 INFO WARPERS= ['TemperatureLogitsWarperCustom', 'TopKLogitsWarper', 'TopPLogitsWarper']
Output generated in 1.40 seconds (4.27 tokens/s, 6 tokens, context 148, seed 698673818)
15:59:23-695966 INFO GENERATE_PARAMS= { 'max_new_tokens': 1024, 'temperature': 0.9, 'top_p': 0.95, 'top_k': 50, 'repetition_penalty': 1.2 }
15:59:23-696892 INFO PROMPT= <|im_start|>assistant
15:59:23-932086 INFO WARPERS= ['TemperatureLogitsWarperCustom', 'TopKLogitsWarper', 'TopPLogitsWarper']
Output generated in 0.66 seconds (10.56 tokens/s, 7 tokens, context 4, seed 1106388940)
```

I have inputted test as prompt in the chat-ui, and the first answer is always You are a helpful assistant. Each time I input another prompt, the answer will become random, as if the AI asks the question and answers it. I have logged the random conversation:

``` 16:14:26-896032 INFO PROMPT= <|im_start|>system You are a helpful assistant.<|im_end|>
<|im_start|>assistant You are a helpful assistant.<|im_end|>
<|im_start|>assistant Create a script for my new startup idea.<|im_end|>
<|im_start|>assistant Sure, I'd be happy to help you with that! Could you please provide me more information about your startup idea?<|im_end|>
<|im_start|>assistant ```

The only thing that works is setting a system prompt, that will then be used and an answer to that system prompt will be generated. But any user prompt will get ignored and a random answer is given.

Here is a log example when the system prompt is set:

``` 16:04:27-498767 INFO PROMPT= <|im_start|>system Write some python code.<|im_end|> <|im_start|>assistant 16:04:27-745105 INFO WARPERS= ['TemperatureLogitsWarperCustom', 'TopKLogitsWarper', 'TopPLogitsWarper'] Output generated in 15.07 seconds (36.50 tokens/s, 550 tokens, context 14, seed 1040727761) ``` The first answer (regardless of the user input) will then be some written python code.

I want to know what I am missing, what makes the API endpoint accept my user prompt?


Ubuntu 22.04.4 LTS nodejs v22.3.0 npm 10.8.1 chat-ui@0.9.1 dev text-generation-webui@abe5ddc8833206381c43b002e95788d4cca0893a

hsayniaj79 commented 2 weeks ago

I host a llama3 model on prem with tgi and use the following in my .env.local to use chatui with it.


Maybe setting the type to tgi instead of openai helps?

Monviech commented 2 weeks ago

Hello @hsayniaj79 , thank you for your answer.

I thought I had to use the OpenAI API between chat-ui and tgi.

I only find documentation of that API:

Has there been a different API in an older version of tgi that you use and that doesn't exist now anymore?

Trying the tgi configuration just results in this 405 error:

Using a model URL is deprecated, please use the `endpointUrl` parameter instead
Using a model URL is deprecated, please use the `endpointUrl` parameter instead
Error: Server response contains error: 405
    at streamingRequest (file:///opt/chat-ui/node_modules/@huggingface/inference/dist/index.js:334:11)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
    at async Proxy.textGenerationStream (file:///opt/chat-ui/node_modules/@huggingface/inference/dist/index.js:705:3)
    at async Module.generate (/opt/chat-ui/src/lib/server/textGeneration/generate.ts:8:20)
    at async textGenerationWithoutTitle (/opt/chat-ui/src/lib/server/textGeneration/index.ts:56:3)
[11:10:10.326] ERROR (2416): Server response contains error: 405
    err: {
      "type": "Error",
      "message": "Server response contains error: 405",
          Error: Server response contains error: 405
              at streamingRequest (file:///opt/chat-ui/node_modules/@huggingface/inference/dist/index.js:334:11)
              at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
              at async Proxy.textGenerationStream (file:///opt/chat-ui/node_modules/@huggingface/inference/dist/index.js:705:3)
              at async Module.generateFromDefaultEndpoint (/opt/chat-ui/src/lib/server/generateFromDefaultEndpoint.ts:12:20)
              at async generateTitle (/opt/chat-ui/src/lib/server/textGeneration/title.ts:54:10)
              at async Module.generateTitleForConversation (/opt/chat-ui/src/lib/server/textGeneration/title.ts:17:19)
hsayniaj79 commented 2 weeks ago

Hi @Monviech,

My bad, I confused hugingface's text-generation-inference (tgi) with oobabooga's text-generation-webui. For the latter, I think the openai type is correct. Could it be that the missing "chatPromptTemplate" in the models in .env.local is the issue?

Monviech commented 2 weeks ago

@hsayniaj79 Oh I haven't noticed that either, I guess I should have written oobabooga for less confusion. I have tried to specify the chatPromptTemplate with a few different ones to see if it changes anything, but it doesn't seem to change anything.

Also, I am not dead set on using oobabooga, it was just my first choice because I have used the stable diffusion webui extensively and it looked just like it. I wanted to have a chatgpt style chat though, so I came to the huggingface webui.

If the experience is better when using a combination of:

I will stop the troubleshooting here and try to use tgi as backend instead.

hsayniaj79 commented 2 weeks ago

@Monviech I can only share my personal experience. We've been using tgi+chatui on our kubernetes cluster in a research institute for a while. So far, it's been pretty straightforward and painless.

Monviech commented 1 week ago

@hsayniaj79 Thank you. I have deployed tgi+chatui instead and things instantly worked. I'm happy you helped me. :)