Need to accommodate large context for LLM promt

samkeen commented 1 year ago

Summary

We currently naively build the prompt with all the text of the returned docs from similarity search done by the vector db.

If this results in a prompt larger than the context window of the LLM we will get an error such as this:

InvalidRequestError: This model's maximum context length is 4097 tokens, however you requested 8889 tokens (8633 in your prompt; 256 for the completion). Please reduce your prompt; or completion length.

Traceback (most recent call last):
  File "/Users/samkeen/Projects/ChatWithRSS/.venv/lib/python3.9/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 552, in _run_script
    exec(code, module.__dict__)
  File "/Users/samkeen/Projects/ChatWithRSS/app.py", line 50, in <module>
    main()
  File "/Users/samkeen/Projects/ChatWithRSS/app.py", line 35, in main
    response = chain.run(input_documents=docs, question=user_question)
  File "/Users/samkeen/Projects/ChatWithRSS/.venv/lib/python3.9/site-packages/langchain/chains/base.py", line 270, in run
    return self(kwargs, callbacks=callbacks, tags=tags)[self.output_keys[0]]
  File "/Users/samkeen/Projects/ChatWithRSS/.venv/lib/python3.9/site-packages/langchain/chains/base.py", line 149, in __call__
    raise e
  File "/Users/samkeen/Projects/ChatWithRSS/.venv/lib/python3.9/site-packages/langchain/chains/base.py", line 143, in __call__
    self._call(inputs, run_manager=run_manager)
  File "/Users/samkeen/Projects/ChatWithRSS/.venv/lib/python3.9/site-packages/langchain/chains/combine_documents/base.py", line 84, in _call

Solution

Option 1

On ingest, chunk the docs to something like 1k (with 100 overlap). Then we know each returned doc is 1k or less Then when doing the similarity search

docs = st.session_state.doc_search.similarity_search(user_question, k=CHROMA_K_VALUE)

ensure k*[chunk size] is reasonably less than the LLM's context window.

Option 2

Use the LLM to summarize the text of the docs returned by the similarity search to fit within the context window

samkeen commented 1 year ago

See

article: https://blog.devgenius.io/how-to-get-around-openai-gpt-3-token-limits-b11583691b32
Work I did here: https://github.com/DevThinkAI/Vidgest/blob/main/vidgest/openaivideosummarizer.py
switch to 16k chatgpt-turbo
another example: https://github.com/hwchase17/chat-your-data/blob/master/query_data.py
langchain retriever docs: https://python.langchain.com/docs/modules/data_connection/retrievers/
https://python.langchain.com/docs/use_cases/question_answering/

samkeen commented 1 year ago

Use token splitter

Token splitting We can also split on token count explicity, if we want. This can be useful because LLMs often have context windows designated in tokens. Tokens are often ~4 characters.

from langchain.text_splitter import TokenTextSplitter
text_splitter = TokenTextSplitter(chunk_size=1, chunk_overlap=0)
text1 = "foo bar bazzyfoo"
text_splitter.split_text(text1)
text_splitter = TokenTextSplitter(chunk_size=10, chunk_overlap=0)
docs = text_splitter.split_documents(pages)
docs[0]
pages[0].metadata

samkeen / ChatWithRSS