oobabooga / text-generation-webui

A Gradio web UI for Large Language Models.
GNU Affero General Public License v3.0
39.39k stars 5.18k forks source link

Chat mode for API #656

Closed mouchourider closed 1 year ago

mouchourider commented 1 year ago

Hi, guys!

First I wanted to thank you for the job you're doing. I wanted to know if you're plaining to implement a chat mode for the API (select character, etc...)? If not, I would like to know how do you use the context/persona data along with the prompt in chat mode? What's your input basically to the model? E.g

"Instruction: Context: ... Assistant:... User:... etc..."

Thanks!

tensiondriven commented 1 year ago

You can see a description of how the API works here

The character's identity is given by the prompt, and the prompt is basically a concatenation of the context and the chat (at least as far as I understand it).

I think your client (the thing consuming the api) would have to do the work of storing the context and concating it, at least at present. It shouldn't be too hard to make a version of the API that accepts a character either as something that would load a character card, or something that would load a character card and the previous chat, though then the API would probably need to expect only the new message, and then you'd probably want endpoints for the buttons to regenerate, etc as well.

I missing something about how chat mode works, so this description might not be entirely accurate.

mouchourider commented 1 year ago

You can see a description of how the API works here

The character's identity is given by the prompt, and the prompt is basically a concatenation of the context and the chat (at least as far as I understand it).

I think your client (the thing consuming the api) would have to do the work of storing the context and concating it, at least at present. It shouldn't be too hard to make a version of the API that accepts a character either as something that would load a character card, or something that would load a character card and the previous chat, though then the API would probably need to expect only the new message, and then you'd probably want endpoints for the buttons to regenerate, etc as well.

I missing something about how chat mode works, so this description might not be entirely accurate.

Thanks for the answer. I already checked how the API works and ofc I'm guessing somehow their's a code in the chat script that's breaking down the whole thing to be "digest" by the model. But I can't find what's the format used. I tried to do it myself but it doesn't work as I would expect (the dialog is completed in a sh**** way).

bmoconno commented 1 year ago

I believe the code you're looking for is in the load_character method in chat.py, what you're specifically interested in is the context variable (which later gets saved to the shared.gradio['context'] in server.py as the load_character function is called).

The context seems to be built from the character .json file like this: char_persona world_scenario example_dialogue

This context is sent with every prompt, you can observe this yourself by running the server.py with the --verbose parameter.

mouchourider commented 1 year ago

You're a saint. Thanks

fenix01 commented 1 year ago

Hello ! I was working on a similar project but I'm not able to use the api correctly. I wanted to use the api in chat mode but everytime I got weird answers. Answers look like data on which the model has been trained. @mouchourider Do you have a working example of using chat mode with the api ?

I use the llama-13b-hf-int4 and it works correctly with the GUI.

DeSinc commented 1 year ago

Hello ! I was working on a similar project but I'm not able to use the api correctly. I wanted to use the api in chat mode but everytime I got weird answers. Answers look like data on which the model has been trained. @mouchourider Do you have a working example of using chat mode with the api ?

I use the llama-13b-hf-int4 and it works correctly with the GUI.

Yeah same here, I'm getting awful outputs because it's not in --chat mode and you can't run that param with the API for some reason. (Did you ever solve this?)

DeSinc commented 1 year ago

Edit: solved by basically re-creating what the chat ui was doing. formatting the input prompt with a brief at least 5+ message long history of messages separated by newlines etc. then sending the whole chat history with rolling updates every single prompt. Tedious but it works just like chat interface (mostly).

mouchourider commented 1 year ago

Edit: solved by basically re-creating what the chat ui was doing. formatting the input prompt with a brief at least 5+ message long history of messages separated by newlines etc. then sending the whole chat history with rolling updates every single prompt. Tedious but it works just like chat interface (mostly).

Could you share it please :). Could code a working example.

DeSinc commented 1 year ago

Edit: solved by basically re-creating what the chat ui was doing. formatting the input prompt with a brief at least 5+ message long history of messages separated by newlines etc. then sending the whole chat history with rolling updates every single prompt. Tedious but it works just like chat interface (mostly).

Could you share it please :). Could code a working example.

Basically all you need to give it is a small history of well formatted messages, about 5+, in order for the model to be able to make a good prediction of what the next message should look like. You can do this by just manually making a fake chat log looking like how you want the bot to reply, and then put [Bot]: with a space at the end implying you want the bot to finish the next line. Or fill in whatever name you want it to have. Note: the name is important as the LLM reads it and acts like it thinks that name should act like.

Mine is a discord bot, so I download the last 30 messages and loop them through a loop, adding each downloadedMessage.User.Username and downloadedMessage.Content and then a \n newline character, to make my own dynamic starter prompt. I save this to a string in my program (in reverse order because discord loads the newest msg first) and voila, I have the whole last 30 messages in a string with newline breaks. I then append this onto the beginning of my prompt every single time I send this whole string in its entirety to oobabooga via the api.

Code snippit:

            if (chatHistoryDownloaded == false)
            {
                chatHistoryDownloaded = true; // only do this once per program run to load msges into memory
                var downloadedMsges = await Msg.Channel.GetMessagesAsync(30).FlattenAsync();
                foreach (var downloadedMsg in downloadedMsges)
                {
                    if (downloadedMsg.Content != Msg.Content) // don't double up the last msg that the user just sent
                    {
                        oobaboogaChatHistory = $"[{downloadedMsg.Author.Username}]: {downloadedMsg.Content}\n" + oobaboogaChatHistory;
                    }
                }
            }

The source code to do this is here if you're interested in seeing how I got it all working: https://github.com/DeSinc/SallyBot

DeSinc commented 1 year ago

The problem remaining with the above is that the API seems to IGNORE stopping_strings so it does NOT stop generating and it just hallucinates a bunch of simulated conversation with your usernames in your chat history. (STILL NOT FIXED)

mouchourider commented 1 year ago

The problem remaining with the above is that the API seems to IGNORE stopping_strings so it does NOT stop generating and it just hallucinates a bunch of simulated conversation with your usernames in your chat history. (STILL NOT FIXED)

Thats exactly my current problem. I need to study more the current chat mod to make it work.

bmoconno commented 1 year ago

I took a look at this today, here's what I've got so far.

Until my PR for picking a character via arguments #976 is merged you'll need to pick the character via the UI, before using the API to chat with them (unless you want to chat with the default "Assistant" that loads auto-magically). And the reply from the bot is in data[1] instead of data[0] like it is for the default API, data[0] has the user's prompt in it for this chat API.

There are still a few things I'm working on before I put a PR to try to get this into the main branch:

botchi09 commented 1 year ago

In my experience, the API doesn't even work for javascript. It throws JSON parse errors on the POST request.

DeSinc commented 1 year ago

the default API is terrible and almost totally disfunctional. Even if you get it working, there is something wrong on a core level that causes hashtag psychosis, where the bot just spams ever increasing amounts of hashtags, or emoji psychosis which you can guess what that is.

I have successfully switched to the Extensions API, enabled at http://localhost:5000/api/v1/generate by enabling this arg: --extensions api

The code to do this can be found here if you want to replicate it: https://github.com/DeSinc/SallyBot Program.cs file, parameters. the 'var content' and 'var result' lines are the ones that formulate and send the string. It's C# code though. I'm sure there's plenty of js examples out there though.

powercore2000 commented 1 year ago

the default API is terrible and almost totally disfunctional. Even if you get it working, there is something wrong on a core level that causes hashtag psychosis, where the bot just spams ever increasing amounts of hashtags, or emoji psychosis which you can guess what that is.

I have successfully switched to the Extensions API, enabled at http://localhost:5000/api/v1/generate by enabling this arg: --extensions api

The code to do this can be found here if you want to replicate it: https://github.com/DeSinc/SallyBot Program.cs file, parameters. the 'var content' and 'var result' lines are the ones that formulate and send the string. It's C# code though. I'm sure there's plenty of js examples out there though.

Didn't know of the extensions API until now, seems to have more features, so I'll probably just switch to that. Is it called kobold compatible API because it mirrors the endpoints and port number of the Kobold AI API?

DeSinc commented 1 year ago

yep that's why. it uses kobold ai format and is compatible with tavern AI, is what I've seen about a dozen people say about it after doing pages and pages of research trying to figure out how to format the damn thing

MarlinMr commented 1 year ago

@DeSinc @mouchourider, did you ever figure out stopping strings?

mouchourider commented 1 year ago

@DeSinc @mouchourider, did you ever figure out stopping strings?

Nope. Still stuck on this.

DeSinc commented 1 year ago

@DeSinc @mouchourider, did you ever figure out stopping strings?

yeah nope still not working, even using --extensions api. never works. it's just not functioning. have half a mind to just have gpt4 run me through how to modify ooba source to fix it myself (worked for dalai)

bmoconno commented 1 year ago

I've opened a PR (#1250) with my changes to the Gradio API to make it work with Chat (this is different than the --extensions api KoboldAI API)

This uses the same functions as normal chat, so context and stopping strings are handled as they normally are.

Please check it out and let me know if you have any issues.

MarlinMr commented 1 year ago

I already got it working with the repo as is. I now use custom stopping strings and it just works. https://github.com/MarlinMr/text-generation-chatui/blob/main/params.json

bmoconno commented 1 year ago

I already got it working with the repo as is. I now use custom stopping strings and it just works. https://github.com/MarlinMr/text-generation-chatui/blob/main/params.json

I'm guessing by "got it working" you're not talking about using a loaded character and having chat context and stuff work with the existing repo? I don't see how that could be possible with the current Gradio API.

DeSinc commented 1 year ago

I already got it working with the repo as is. I now use custom stopping strings and it just works. https://github.com/MarlinMr/text-generation-chatui/blob/main/params.json

wow... that worked... custom_stopping_strings works totally fine on extension API...... why was this not documented?

MarlinMr commented 1 year ago

I'm guessing by "got it working" you're not talking about using a loaded character and having chat context and stuff work with the existing repo? I don't see how that could be possible with the current Gradio API.

I am. It works.

I just send in the context in the api call every time. The repo I liked has everything "documented" in the code. Pretty self explanatory. Will document more later. And make the script more general, but I just needed to get it working first.

I don't really see how it works yet, but it does and I shall dig into the code and figure it out later today.

MarlinMr commented 1 year ago

wow... that worked... custom_stopping_strings works totally fine on extension API...... why was this not documented?

Probably because it's not actually part of the API. It's more complex than that, and has parts made specifically for this repo.

I will develop further later today.

bmoconno commented 1 year ago

Ahh, I didn't look at the whole repo, just the .json you linked. Looking now, it still doesn't look like this is using the character data (char_persona, world_scenario, and example_dialogue) or previous chat messages as part of the context for the chat though?

MarlinMr commented 1 year ago

Ahh, I didn't look at the whole repo, just the .json you linked. Looking now, it still doesn't look like this is using the character data (char_persona, world_scenario, and example_dialogue) or previous chat messages as part of the context for the chat though?

Ah, i ment to link the whole repo.

But yes, it's technically not using it right now, but there is commented out code that does. It's also trivial to set up

MarlinMr commented 1 year ago

I now have functioning history in my bot.

lpurdy01 commented 1 year ago

Would it be possible to have an endpoint that hosts a version of the API compatible with OpenAI clients?

Integrating something like: https://github.com/lhenault/simpleAI

or https://github.com/hyperonym/basaran

would open up any models to a lot of other environments.

oobabooga commented 1 year ago

Chat API has been implemented here

https://github.com/oobabooga/text-generation-webui/pull/2233