Feature Request: Gracefully handle when conversation contexts become too long

a3957273 commented 1 year ago

Summary

At the moment the plugin seems to fail with a generic error message when the conversation is too long. Ideally, some context pruning might take place to keep within a context limit defined by the plugin.

a3957273 commented 1 year ago

We have an internal truncateRequest function to resolve this on our instance. It tries to take as many tokens as possible up until about half the token limit. This is super simple implementation that we threw together in under an hour.

func getSubstring(message string, characters int) string {
    if len(message) < characters {
        return message
    }
    return message[:characters/2] + message[len(message)-characters/2:]
}

func (s *OpenAI) truncateRequest(request openaiClient.ChatCompletionRequest) openaiClient.ChatCompletionRequest {
    var messages []openaiClient.ChatCompletionMessage
    token_count := 0
    limit := s.TokenLimit() / 2

    for i := len(request.Messages) - 1; i >= 0; i-- {
        message := request.Messages[i]

        // Add a few for the role, etc.
        tokens := s.CountTokens(message.Content) + 10

        // Can we fit the entire message in the window?
        if token_count+tokens < limit {
            token_count += tokens
            messages = append([]openaiClient.ChatCompletionMessage{message}, messages...)
            continue
        }

        // We can't fit the message in, so just include the start and the end
        remaining := limit - token_count
        characters := remaining * 4 // Estimate 4 characters per token

        message.Content = getSubstring(message.Content, characters)
        messages = append(messages, message)
        break
    }

    request.Messages = messages
    return request
}

crspeller commented 12 months ago

This makes a lot of sense as it's a pretty bad experience when you run oft of context at the moment. The annoying part is (at least last time I checked) there is no precise token counting library in Go.

a3957273 commented 7 months ago

We've had a lot of success with fairly basic heuristics. We've generated millions of values and using the existing built-in token counter has worked every time. I realise we're possibly losing a small amount of the context window, but the trade-off is worthwhile for us.

mattermost / mattermost-plugin-ai

Feature Request: Gracefully handle when conversation contexts become too long #89

Summary