Context-window Compression

iansinnott commented 1 year ago

It would be nice (and cost effective) to do something other than send the entire chat history with each message as context. I've never run into a context limit in my own usage (8k, gpt4) but it's also not cheap.

Essentially we want the same infinite chat thread experience as the official ChatGPT UI.

Current thinking:

Run chat summaries via cheap gpt3.5 and include the summary.
Do something with a vector store
Include context via the full-text search feature
- this is already built, which is a big plus. I'm skeptical of the results though using traditional fts methods

iansinnott commented 11 months ago

Truncation may be the simpler approach. Specify a truncation window and just use that

samrahimi commented 11 months ago

Run the summaries on gpt-3.5-turbo-instruct and you'll get better results.

bet0x commented 10 months ago

What about removing the stop words? It properly implemented the meaning could be the same and it should reduce the context size:

import nltk
from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords

nltk.download('stopwords')

def remove_stopwords(text):
    """
    Remove stopwords from the text.

    Parameters:
    text (str): The input text.

    Returns:
    str: The text without stopwords.
    """

    # Tokenize the text
    tokens = word_tokenize(text)

    # Load stopwords from NLTK
    stop_words = set(stopwords.words('english'))

    # Remove stopwords from the text
    filtered_text = ' '.join([word for word in tokens if word.lower() not in stop_words])

    return filtered_text

# Example text
example_text = "The cat sat on the mat. The cat is fluffy. Fluffy cats are cute. Cats like to sit on mats."

# Compress the context with an advanced method and a lower threshold
compressed_text_advanced = remove_stopwords(example_text)

print("ORG: " + example_text)
print("COM: " + compressed_text_advanced)

Output:

ORG: The cat sat on the mat. The cat is fluffy. Fluffy cats are cute. Cats like to sit on mats.
COM: cat sat mat . cat fluffy . Fluffy cats cute . Cats like sit mats .

samrahimi commented 9 months ago

I wonder if removing the stop words will affect the quality of the output. If you have a long prompt where much of the context js written in what seems like broken English, I would worry that the output is going to follow whatever style was prevalent in the prompt. Have you noticed an impact?

iansinnott commented 9 months ago

What about removing the stop words?

Hm, i wonder if that can be done in the browser. Despite having a desktop build this is entirely a frontend project—everything runs in a browser window. The database is Sqlite via WASM. There may be other libs to tackle this but I think nltk would be a non-starter since it's meant for a python environment.

Could create a lambda for this, but ideally it all runs locally for low-latency interactions.

iansinnott commented 9 months ago

I've been experimenting with adding a vector DB in the hopes that having access to similarity search would allow some creative context compression via selecting only relevant messages to include in context. However, it doesn't run in Safari, so that effort has stalled.

The next move here is likely to add a rolling context window, probably customizable by number of tokens.

Open to any suggestions though.

iansinnott commented 9 months ago

Having explored the in-browser vector storage and come up short with Victor [1] I think the initial move will probably be a sliding window of chat history + summary of whatever else is there. This is what langchain does with their ConversationSummaryBufferMemory which seems like it will be good enough for infinite chat threads that don't require more and more tokens.

Some considerations:

How long is the chat message window? 1k tokens? 200 tokens? no idea what works best, but since tokens are priced by the thousand I'm thinking of using 1k to start.
What model to use for summarization. whatever model the user already has selected would be the simple answer, but I'd generally want to use a cheaper model for summaries while using the best model for chatting.

iansinnott / prompta

Context-window Compression #1