Open iansinnott opened 1 year ago
Truncation may be the simpler approach. Specify a truncation window and just use that
Run the summaries on gpt-3.5-turbo-instruct and you'll get better results.
What about removing the stop words? It properly implemented the meaning could be the same and it should reduce the context size:
import nltk
from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords
nltk.download('stopwords')
def remove_stopwords(text):
"""
Remove stopwords from the text.
Parameters:
text (str): The input text.
Returns:
str: The text without stopwords.
"""
# Tokenize the text
tokens = word_tokenize(text)
# Load stopwords from NLTK
stop_words = set(stopwords.words('english'))
# Remove stopwords from the text
filtered_text = ' '.join([word for word in tokens if word.lower() not in stop_words])
return filtered_text
# Example text
example_text = "The cat sat on the mat. The cat is fluffy. Fluffy cats are cute. Cats like to sit on mats."
# Compress the context with an advanced method and a lower threshold
compressed_text_advanced = remove_stopwords(example_text)
print("ORG: " + example_text)
print("COM: " + compressed_text_advanced)
Output:
ORG: The cat sat on the mat. The cat is fluffy. Fluffy cats are cute. Cats like to sit on mats.
COM: cat sat mat . cat fluffy . Fluffy cats cute . Cats like sit mats .
I wonder if removing the stop words will affect the quality of the output. If you have a long prompt where much of the context js written in what seems like broken English, I would worry that the output is going to follow whatever style was prevalent in the prompt. Have you noticed an impact?
What about removing the stop words?
Hm, i wonder if that can be done in the browser. Despite having a desktop build this is entirely a frontend project—everything runs in a browser window. The database is Sqlite via WASM. There may be other libs to tackle this but I think nltk would be a non-starter since it's meant for a python environment.
Could create a lambda for this, but ideally it all runs locally for low-latency interactions.
I've been experimenting with adding a vector DB in the hopes that having access to similarity search would allow some creative context compression via selecting only relevant messages to include in context. However, it doesn't run in Safari, so that effort has stalled.
The next move here is likely to add a rolling context window, probably customizable by number of tokens.
Open to any suggestions though.
Having explored the in-browser vector storage and come up short with Victor [1] I think the initial move will probably be a sliding window of chat history + summary of whatever else is there. This is what langchain does with their ConversationSummaryBufferMemory
which seems like it will be good enough for infinite chat threads that don't require more and more tokens.
Some considerations:
It would be nice (and cost effective) to do something other than send the entire chat history with each message as context. I've never run into a context limit in my own usage (8k, gpt4) but it's also not cheap.
Essentially we want the same infinite chat thread experience as the official ChatGPT UI.
Current thinking: