Open a3957273 opened 1 year ago
We have an internal truncateRequest
function to resolve this on our instance. It tries to take as many tokens as possible up until about half the token limit. This is super simple implementation that we threw together in under an hour.
func getSubstring(message string, characters int) string {
if len(message) < characters {
return message
}
return message[:characters/2] + message[len(message)-characters/2:]
}
func (s *OpenAI) truncateRequest(request openaiClient.ChatCompletionRequest) openaiClient.ChatCompletionRequest {
var messages []openaiClient.ChatCompletionMessage
token_count := 0
limit := s.TokenLimit() / 2
for i := len(request.Messages) - 1; i >= 0; i-- {
message := request.Messages[i]
// Add a few for the role, etc.
tokens := s.CountTokens(message.Content) + 10
// Can we fit the entire message in the window?
if token_count+tokens < limit {
token_count += tokens
messages = append([]openaiClient.ChatCompletionMessage{message}, messages...)
continue
}
// We can't fit the message in, so just include the start and the end
remaining := limit - token_count
characters := remaining * 4 // Estimate 4 characters per token
message.Content = getSubstring(message.Content, characters)
messages = append(messages, message)
break
}
request.Messages = messages
return request
}
This makes a lot of sense as it's a pretty bad experience when you run oft of context at the moment. The annoying part is (at least last time I checked) there is no precise token counting library in Go.
We've had a lot of success with fairly basic heuristics. We've generated millions of values and using the existing built-in token counter has worked every time. I realise we're possibly losing a small amount of the context window, but the trade-off is worthwhile for us.
Summary
At the moment the plugin seems to fail with a generic error message when the conversation is too long. Ideally, some context pruning might take place to keep within a context limit defined by the plugin.