Difficulties with chat-ui promp to text-generation-webui openai api endpoint

Monviech commented 2 weeks ago

Hello,

I'm trying my best to get the huggingface chat-ui working with the API endpoint of text-generation-webui.

I would be really happy if I could get a hint what I am doing wrong.

Here is a reverse proxied test instance: https://chat-ui-test.pischem.com/

I can't get my prompt that I input into the chat-ui to pass to the text-generation-webui. Every prompt will be ignored and a random answer is returned.

Here is the command I start text-generation-webui:

```./start_linux.sh --listen --listen-port 8000 --api --api-port 8001 --verbose --model NTQAI_Nxcode-CQ-7B-orpo```

Here is my current .local.env of the chat-ui and the command I run it with:

```npm run dev -- --host``` ``` MODELS=`[ { "name": "text-generation-webui", "id": "text-generation-webui", "parameters": { "temperature": 0.9, "top_p": 0.95, "max_new_tokens": 1024, "stop": [] }, "endpoints": [{ "type" : "openai", "baseURL": "http://172.16.0.169:8001/v1", "extraBody": { "repetition_penalty": 1.2, "top_k": 50, "truncate": 1000 } }] } ]` MONGODB_URL=`mongodb://localhost:27017` DEBUG=`true` ```

Here are the logs what happen when I write a prompt:

chatui:

``` > chat-ui@0.9.1 dev > vite dev --host VITE v4.5.3 ready in 777 ms ➜ Local: http://localhost:5173/ ➜ Network: http://172.16.0.135:5173/ ➜ Network: http://172.17.0.1:5173/ ➜ press h to show help (node:6250) [DEP0040] DeprecationWarning: The `punycode` module is deprecated. Please use a userland alternative instead. (Use `node --trace-deprecation ...` to show where the warning was created) [13:58:52.476] INFO (6250): [MIGRATIONS] Begin check... [13:58:52.478] INFO (6250): [MIGRATIONS] "Update search assistants" already applied. Skipping... [13:58:52.478] INFO (6250): [MIGRATIONS] "Update deprecated models in assistants with the default model" should not be applied for this run. Skipping... [13:58:52.478] INFO (6250): [MIGRATIONS] "Add empty 'tools' record in settings" already applied. Skipping... [13:58:52.478] INFO (6250): [MIGRATIONS] "Convert message updates to the new schema" already applied. Skipping... [13:58:52.478] INFO (6250): [MIGRATIONS] "Convert message files to the new schema" already applied. Skipping... [13:58:52.478] INFO (6250): [MIGRATIONS] "Trim message updates to reduce stored size" already applied. Skipping... [13:58:52.478] INFO (6250): [MIGRATIONS] All migrations applied. Releasing lock [13:58:52.498] INFO (6250): Metrics server listening on port 5565 Browserslist: caniuse-lite is outdated. Please run: npx update-browserslist-db@latest Why you should do it regularly: https://github.com/browserslist/update-db#readme (node:6250) Warning: To load an ES module, set "type": "module" in the package.json or use the .mjs extension. (node:6250) Warning: To load an ES module, set "type": "module" in the package.json or use the .mjs extension. Source path: /opt/chat-ui/src/lib/components/chat/FileDropzone.svelte?svelte&type=style&lang.css Setting up new context... Source path: /opt/chat-ui/src/lib/components/chat/ChatInput.svelte?svelte&type=style&lang.css Source path: /opt/chat-ui/src/lib/components/ToolsMenu.svelte?svelte&type=style&lang.css Source path: /opt/chat-ui/src/lib/components/chat/ChatMessage.svelte?svelte&type=style&lang.css JIT TOTAL: 265.317ms (node:6250) Warning: Label 'JIT TOTAL' already exists for console.time() (node:6250) Warning: Label 'JIT TOTAL' already exists for console.time() (node:6250) Warning: Label 'JIT TOTAL' already exists for console.time() (node:6250) Warning: No such label 'JIT TOTAL' for console.timeEnd() (node:6250) Warning: No such label 'JIT TOTAL' for console.timeEnd() (node:6250) Warning: No such label 'JIT TOTAL' for console.timeEnd() Source path: /opt/chat-ui/src/lib/components/OpenWebSearchResults.svelte?svelte&type=style&lang.css Source path: /opt/chat-ui/src/lib/components/chat/ToolUpdate.svelte?svelte&type=style&lang.css JIT TOTAL: 1.355ms (node:6250) Warning: Label 'JIT TOTAL' already exists for console.time() (node:6250) Warning: No such label 'JIT TOTAL' for console.timeEnd() Source path: /opt/chat-ui/src/styles/main.css Setting up new context... Finding changed files: 8.775ms Reading changed files: 158.906ms Sorting candidates: 7.72ms Generate rules: 397.398ms Build stylesheet: 11.899ms Potential classes: 8755 Active contexts: 2 JIT TOTAL: 767.815ms Source path: /opt/chat-ui/src/styles/main.css?inline= Setting up new context... Finding changed files: 3.466ms Reading changed files: 119.942ms Sorting candidates: 7.852ms Generate rules: 339.343ms Build stylesheet: 6.497ms Potential classes: 8755 Active contexts: 3 JIT TOTAL: 635.226ms Source path: /opt/chat-ui/src/styles/main.css Finding changed files: 4.567ms Reading changed files: 0.005ms Sorting candidates: 0.003ms Generate rules: 0.036ms Build stylesheet: 0.003ms Potential classes: 1 Active contexts: 3 JIT TOTAL: 59.553ms Source path: /opt/chat-ui/src/lib/components/chat/ChatInput.svelte?svelte&type=style&lang.css JIT TOTAL: 0.828ms Source path: /opt/chat-ui/src/lib/components/ToolsMenu.svelte?svelte&type=style&lang.css Source path: /opt/chat-ui/src/lib/components/chat/FileDropzone.svelte?svelte&type=style&lang.css Source path: /opt/chat-ui/src/lib/components/chat/ChatMessage.svelte?svelte&type=style&lang.css JIT TOTAL: 2.513ms (node:6250) Warning: Label 'JIT TOTAL' already exists for console.time() (node:6250) Warning: Label 'JIT TOTAL' already exists for console.time() (node:6250) Warning: No such label 'JIT TOTAL' for console.timeEnd() (node:6250) Warning: No such label 'JIT TOTAL' for console.timeEnd() Source path: /opt/chat-ui/src/lib/components/OpenWebSearchResults.svelte?svelte&type=style&lang.css Source path: /opt/chat-ui/src/lib/components/chat/ToolUpdate.svelte?svelte&type=style&lang.css JIT TOTAL: 0.674ms (node:6250) Warning: Label 'JIT TOTAL' already exists for console.time() (node:6250) Warning: No such label 'JIT TOTAL' for console.timeEnd() OpenAI:DEBUG:request http://172.16.0.169:8001/v1/chat/completions { method: 'post', path: '/chat/completions', body: { model: 'text-generation-webui', messages: [ [Object], [Object], [Object], [Object], [Object], [Object], [Object], [Object], [Object], [Object], [Object], [Object], [Object], [Object], [Object], [Object], [Object], [Object] ], stream: true, max_tokens: 15, stop: [], temperature: 0.9, top_p: 0.95, frequency_penalty: undefined, repetition_penalty: 1.2, top_k: 50, truncate: 1000 }, stream: true } { 'content-length': '2560', accept: 'application/json', 'content-type': 'application/json', 'user-agent': 'OpenAI/JS 4.47.1', 'x-stainless-lang': 'js', 'x-stainless-package-version': '4.47.1', 'x-stainless-os': 'Linux', 'x-stainless-arch': 'x64', 'x-stainless-runtime': 'node', 'x-stainless-runtime-version': 'v22.3.0', authorization: 'Bearer ' } OpenAI:DEBUG:request http://172.16.0.169:8001/v1/chat/completions { method: 'post', path: '/chat/completions', body: { model: 'text-generation-webui', messages: [ [Object], [Object] ], stream: true, max_tokens: 1024, stop: [], temperature: 0.9, top_p: 0.95, frequency_penalty: undefined, repetition_penalty: 1.2, top_k: 50, truncate: 1000 }, stream: true } { 'content-length': '405', accept: 'application/json', 'content-type': 'application/json', 'user-agent': 'OpenAI/JS 4.47.1', 'x-stainless-lang': 'js', 'x-stainless-package-version': '4.47.1', 'x-stainless-os': 'Linux', 'x-stainless-arch': 'x64', 'x-stainless-runtime': 'node', 'x-stainless-runtime-version': 'v22.3.0', authorization: 'Bearer ' } OpenAI:DEBUG:response 200 http://172.16.0.169:8001/v1/chat/completions Headers { [Symbol(map)]: [Object: null prototype] { date: [ 'Wed, 12 Jun 2024 13:59:20 GMT' ], server: [ 'uvicorn' ], 'cache-control': [ 'no-cache' ], connection: [ 'keep-alive' ], 'x-accel-buffering': [ 'no' ], 'content-type': [ 'text/event-stream; charset=utf-8' ], 'transfer-encoding': [ 'chunked' ] } } PassThrough { _events: { close: [Function: bound onceWrapper] { listener: [Function: onclose] }, error: [ [Function: onerror], [Function (anonymous)] ], prefinish: [Function: prefinish], finish: [Function: bound onceWrapper] { listener: [Function: onfinish] }, drain: undefined, data: undefined, end: undefined, readable: undefined, unpipe: [Function: onunpipe] }, _readableState: ReadableState { highWaterMark: 65536, buffer: [], bufferIndex: 0, length: 0, pipes: [], awaitDrainWriters: null, [Symbol(kState)]: 1048844 }, _writableState: WritableState { highWaterMark: 65536, length: 0, corked: 0, onwrite: [Function: bound onwrite], writelen: 0, bufferedIndex: 0, pendingcb: 0, [Symbol(kState)]: 17580812, [Symbol(kBufferedValue)]: null }, allowHalfOpen: true, _maxListeners: undefined, _eventsCount: 5, [Symbol(shapeMode)]: true, [Symbol(kCapture)]: false, [Symbol(kCallback)]: null } OpenAI:DEBUG:response 200 http://172.16.0.169:8001/v1/chat/completions Headers { [Symbol(map)]: [Object: null prototype] { date: [ 'Wed, 12 Jun 2024 13:59:20 GMT' ], server: [ 'uvicorn' ], 'cache-control': [ 'no-cache' ], connection: [ 'keep-alive' ], 'x-accel-buffering': [ 'no' ], 'content-type': [ 'text/event-stream; charset=utf-8' ], 'transfer-encoding': [ 'chunked' ] } } PassThrough { _events: { close: [Function: bound onceWrapper] { listener: [Function: onclose] }, error: [ [Function: onerror], [Function (anonymous)] ], prefinish: [Function: prefinish], finish: [Function: bound onceWrapper] { listener: [Function: onfinish] }, drain: undefined, data: undefined, end: undefined, readable: undefined, unpipe: [Function: onunpipe] }, _readableState: ReadableState { highWaterMark: 65536, buffer: [], bufferIndex: 0, length: 0, pipes: [], awaitDrainWriters: null, [Symbol(kState)]: 1048844 }, _writableState: WritableState { highWaterMark: 65536, length: 0, corked: 0, onwrite: [Function: bound onwrite], writelen: 0, bufferedIndex: 0, pendingcb: 0, [Symbol(kState)]: 17580812, [Symbol(kBufferedValue)]: null }, allowHalfOpen: true, _maxListeners: undefined, _eventsCount: 5, [Symbol(shapeMode)]: true, [Symbol(kCapture)]: false, [Symbol(kCallback)]: null } ```

text-generation-webui:

``` 15:58:23-000843 INFO Starting Text generation web UI 15:58:23-003506 WARNING You are potentially exposing the web UI to the entire internet without any access password. You can create one with the "--gradio-auth" flag like this: --gradio-auth username:password Make sure to replace username:password with your own. 15:58:23-008582 INFO Loading "NTQAI_Nxcode-CQ-7B-orpo" 15:58:23-010179 INFO TRANSFORMERS_PARAMS= {'low_cpu_mem_usage': True, 'torch_dtype': torch.float16} Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:02<00:00, 1.40s/it] 15:58:30-445446 INFO Loaded "NTQAI_Nxcode-CQ-7B-orpo" in 7.44 seconds. 15:58:30-446274 INFO LOADER: "Transformers" 15:58:30-446723 INFO TRUNCATION LENGTH: 65536 15:58:30-447183 INFO INSTRUCTION TEMPLATE: "Custom (obtained from model metadata)" 15:58:30-447709 INFO Loading the extension "openai" 15:58:30-510985 INFO OpenAI-compatible API URL: http://0.0.0.0:8001 Running on local URL: http://0.0.0.0:8000 15:59:21-782210 INFO GENERATE_PARAMS= { 'max_new_tokens': 15, 'temperature': 0.9, 'temperature_last': False, 'dynamic_temperature': False, 'dynatemp_low': 1, 'dynatemp_high': 1, 'dynatemp_exponent': 1, 'smoothing_factor': 0, 'smoothing_curve': 1, 'top_p': 0.95, 'min_p': 0, 'top_k': 50, 'repetition_penalty': 1.2, 'presence_penalty': 0, 'frequency_penalty': 0, 'repetition_penalty_range': 1024, 'typical_p': 1, 'tfs': 1, 'top_a': 0, 'guidance_scale': 1, 'penalty_alpha': 0, 'mirostat_mode': 0, 'mirostat_tau': 5, 'mirostat_eta': 0.1, 'do_sample': True, 'encoder_repetition_penalty': 1, 'no_repeat_ngram_size': 0, 'use_cache': True, 'eos_token_id': [4], 'stopping_criteria': [ ], 'logits_processor': []} 15:59:21-783364 INFO PROMPT= <|im_start|>system You are a summarization AI. Summarize the user's request into a single short sentence of four words or less. Do not try to answer it, only summarize the user's query. Always start your answer with an emoji relevant to the summary<|im_end|> <|im_start|>assistant 🇬🇦 President of Gabon<|im_end|> <|im_start|>assistant 🧑 Julien Chaumond<|im_end|> <|im_start|>assistant 🔢 Simple math operation<|im_end|> <|im_start|>assistant 📰 Latest news<|im_end|> <|im_start|>assistant 🍰 Cheesecake recipe<|im_end|> <|im_start|>assistant 🎥 Favorite movie<|im_end|> <|im_start|>assistant 🤖 AI definition<|im_end|> <|im_start|>assistant 🐱 Cute cat drawing<|im_end|> <|im_start|>assistant 15:59:22-068257 INFO WARPERS= ['TemperatureLogitsWarperCustom', 'TopKLogitsWarper', 'TopPLogitsWarper'] Output generated in 1.40 seconds (4.27 tokens/s, 6 tokens, context 148, seed 698673818) 15:59:23-695966 INFO GENERATE_PARAMS= { 'max_new_tokens': 1024, 'temperature': 0.9, 'temperature_last': False, 'dynamic_temperature': False, 'dynatemp_low': 1, 'dynatemp_high': 1, 'dynatemp_exponent': 1, 'smoothing_factor': 0, 'smoothing_curve': 1, 'top_p': 0.95, 'min_p': 0, 'top_k': 50, 'repetition_penalty': 1.2, 'presence_penalty': 0, 'frequency_penalty': 0, 'repetition_penalty_range': 1024, 'typical_p': 1, 'tfs': 1, 'top_a': 0, 'guidance_scale': 1, 'penalty_alpha': 0, 'mirostat_mode': 0, 'mirostat_tau': 5, 'mirostat_eta': 0.1, 'do_sample': True, 'encoder_repetition_penalty': 1, 'no_repeat_ngram_size': 0, 'use_cache': True, 'eos_token_id': [4], 'stopping_criteria': [ ], 'logits_processor': []} 15:59:23-696892 INFO PROMPT= <|im_start|>assistant 15:59:23-932086 INFO WARPERS= ['TemperatureLogitsWarperCustom', 'TopKLogitsWarper', 'TopPLogitsWarper'] Output generated in 0.66 seconds (10.56 tokens/s, 7 tokens, context 4, seed 1106388940) ```

I have inputted test as prompt in the chat-ui, and the first answer is always You are a helpful assistant. Each time I input another prompt, the answer will become random, as if the AI asks the question and answers it. I have logged the random conversation:

``` 16:14:26-896032 INFO PROMPT= <|im_start|>system You are a helpful assistant.<|im_end|> <|im_start|>assistant You are a helpful assistant.<|im_end|> <|im_start|>assistant Create a script for my new startup idea.<|im_end|> <|im_start|>assistant Sure, I'd be happy to help you with that! Could you please provide me more information about your startup idea? What is it supposed to do or accomplish, and who would benefit from using it? This will greatly influence the direction of the script we create.<|im_end|> <|im_start|>assistant As an AI language model, I don't have personal startup ideas as such; however, here's a general structure of how you could start creating one: 1. Executive Summary: In this section, briefly describe what the company does and its goals/services. Also mention why it’s relevant in the market today and who might need it (Target Market). Finally give a brief overview on strategy, marketing plan etc., 2. Business Model: Describe how the business makes money i.e., revenue streams, partnerships, a[...364 characters hidden...]omers - advertising, content creation, events, and digital marketing platforms. Include any specific tools and technologies being used. 5. Revenue Forecast: Create a simple table summarizing expected monthly and annual revenues based on assumptions made during modeling. Be prepared to include assumptions within the text itself. Use time-series analysis to show growth prospects over the next few years. 6. Financial Plan: Based on the defined revenue stream(s) explain the cash flow through various periods, accounting methods, balance sheets (preferred investor equity, cash at hand, debt outstanding etc.) 7. Appendices : Mention additional information not needed for final presentation such as detailed financial forecasts, industry research reports, competitive analyses etc.. Remember each section should aim to capture a unique aspect of your organization so ensure it fits well together before drafting your actual proposal! Good luck with your venture!<|im_end|> <|im_start|>assistant ```

The only thing that works is setting a system prompt, that will then be used and an answer to that system prompt will be generated. But any user prompt will get ignored and a random answer is given.

Here is a log example when the system prompt is set:

``` 16:04:27-498767 INFO PROMPT= <|im_start|>system Write some python code.<|im_end|> <|im_start|>assistant 16:04:27-745105 INFO WARPERS= ['TemperatureLogitsWarperCustom', 'TopKLogitsWarper', 'TopPLogitsWarper'] Output generated in 15.07 seconds (36.50 tokens/s, 550 tokens, context 14, seed 1040727761) ``` The first answer (regardless of the user input) will then be some written python code.

I want to know what I am missing, what makes the API endpoint accept my user prompt?

Environment:

Ubuntu 22.04.4 LTS nodejs v22.3.0 npm 10.8.1 chat-ui@0.9.1 dev text-generation-webui@abe5ddc8833206381c43b002e95788d4cca0893a

hsayniaj79 commented 2 weeks ago

I host a llama3 model on prem with tgi and use the following in my .env.local to use chatui with it.

        "endpoints":[
            {
                "type":"tgi",
                "url":"http://llama3-70b-instruct-api"
            }
        ]

Maybe setting the type to tgi instead of openai helps?

Monviech commented 2 weeks ago

Hello @hsayniaj79 , thank you for your answer.

I thought I had to use the OpenAI API between chat-ui and tgi.

I only find documentation of that API: https://github.com/oobabooga/text-generation-webui/wiki/12-%E2%80%90-OpenAI-API#examples

Has there been a different API in an older version of tgi that you use and that doesn't exist now anymore?

Trying the tgi configuration just results in this 405 error:

Using a model URL is deprecated, please use the `endpointUrl` parameter instead
Using a model URL is deprecated, please use the `endpointUrl` parameter instead
Error: Server response contains error: 405
    at streamingRequest (file:///opt/chat-ui/node_modules/@huggingface/inference/dist/index.js:334:11)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
    at async Proxy.textGenerationStream (file:///opt/chat-ui/node_modules/@huggingface/inference/dist/index.js:705:3)
    at async Module.generate (/opt/chat-ui/src/lib/server/textGeneration/generate.ts:8:20)
    at async textGenerationWithoutTitle (/opt/chat-ui/src/lib/server/textGeneration/index.ts:56:3)
[11:10:10.326] ERROR (2416): Server response contains error: 405
    err: {
      "type": "Error",
      "message": "Server response contains error: 405",
      "stack":
          Error: Server response contains error: 405
              at streamingRequest (file:///opt/chat-ui/node_modules/@huggingface/inference/dist/index.js:334:11)
              at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
              at async Proxy.textGenerationStream (file:///opt/chat-ui/node_modules/@huggingface/inference/dist/index.js:705:3)
              at async Module.generateFromDefaultEndpoint (/opt/chat-ui/src/lib/server/generateFromDefaultEndpoint.ts:12:20)
              at async generateTitle (/opt/chat-ui/src/lib/server/textGeneration/title.ts:54:10)
              at async Module.generateTitleForConversation (/opt/chat-ui/src/lib/server/textGeneration/title.ts:17:19)
    }

hsayniaj79 commented 2 weeks ago

Hi @Monviech,

My bad, I confused hugingface's text-generation-inference (tgi) with oobabooga's text-generation-webui. For the latter, I think the openai type is correct. Could it be that the missing "chatPromptTemplate" in the models in .env.local is the issue?

Monviech commented 2 weeks ago

@hsayniaj79 Oh I haven't noticed that either, I guess I should have written oobabooga for less confusion. I have tried to specify the chatPromptTemplate with a few different ones to see if it changes anything, but it doesn't seem to change anything.

Also, I am not dead set on using oobabooga, it was just my first choice because I have used the stable diffusion webui extensively and it looked just like it. I wanted to have a chatgpt style chat though, so I came to the huggingface webui.

If the experience is better when using a combination of: https://github.com/huggingface/text-generation-inference https://github.com/huggingface/chat-ui

I will stop the troubleshooting here and try to use tgi as backend instead.

hsayniaj79 commented 2 weeks ago

@Monviech I can only share my personal experience. We've been using tgi+chatui on our kubernetes cluster in a research institute for a while. So far, it's been pretty straightforward and painless.

Monviech commented 1 week ago

@hsayniaj79 Thank you. I have deployed tgi+chatui instead and things instantly worked. I'm happy you helped me. :)

huggingface / chat-ui

Difficulties with chat-ui promp to text-generation-webui openai api endpoint #1277