Open haroldwinstob opened 1 year ago
Yeah it'll be nice if certain key messages like initial system message can be retained from being truncated backward by token limit.
Here, you can copy the standard config into a new variable and set the new config's max_tokens as the original max_tokens minus the token count of messages
In some of my attempts to create a UI what I did was monitor the token usage and when it was 512 tokens from the limit, send a separate API request asking the model to summarize the conversation. Then I cleared the context and began a new conversation that only contained the summary as if the assistant had said it, beginning with "Conversation summary:". This was done before appending the latest user message and its response, so that these two were always kept separate from the summary.
I kept two message histories. The one I showed to the user was the complete one. And internally, there was the one that fit into the model's token, which was truncated using the method I explained above as needed.
I know this is overly complicated but maybe there are good ideas in it. Maybe we could do something similar but only summarizing the first half of the context, so that even if this means it needs to be summarized more often, there's also less contextual information lost each time.
I ended up ditching my UI and I keep using BetterChatGPT because it's full of useful features. The only think I miss is having long conversations that don't suddenly stop when the context is full.
Hi, firstly I would like to thank you for creating and sharing this app for free. It really is a great, simple web-ui app that prevents common problems on offiicial ChatGPT app like slowness, too many requests, lost connection on idle.
Let me offer an idea that might help with token limit issue. It's not a perfect solution but you will get the idea.
If there are technical aspects that makes implementing above features isn't as simple as expected please let me know.