[Question]: How to effectively summarize large documents using SummaryIndex or similar?

sayanmndl21 commented 8 months ago

Question Validation

[X] I have searched both the documentation and discord for an answer.

Question

I am following this tutorial to create a small RAG engine with a local llama model (Mixtral 8x7B): Multi Document Agents v1. But this always leads to ValueError: Requested tokens (xxx) exceed context window of yyy at

await summary_query_engine.aquery(
                "Extract a concise 1-2 line summary of this document"
            )

I have so far tried reducing chunk size, increase max_tokens to the limit, adding CompactAndRefine() as synthesizer but none of them works. Is an argument which I might be missing?

dosubot[bot] commented 8 months ago

Hey @sayanmndl21! Great to see you back here asking the challenging questions. How's it going?🚀

Based on the information you've provided and the similar issues I found in the LlamaIndex repository, it seems like you're encountering a common problem when dealing with large documents: the total number of tokens in the document exceeds the model's maximum token limit.

To avoid this error, you can implement a strategy to break down the document into smaller chunks that fit within the model's token limit before summarizing each part. After summarizing the individual parts, you can then combine these summaries to form a final summary for the entire document. This approach ensures that each chunk of the document is within the model's context window limit, thus avoiding the error.

Here's a high-level approach to implement this strategy:

Chunking the Document: Before summarizing, divide the document into smaller parts. Each part should have a number of tokens less than the model's maximum token limit.
Summarizing Each Chunk: Use the DocumentSummaryIndex class to summarize each chunk individually. This involves modifying or extending the _add_nodes_to_index method to handle chunked input and generate summaries for each chunk.
Combining Summaries: After obtaining summaries for all chunks, you can optionally combine these summaries into a comprehensive summary that represents the entire document. This step might involve another round of summarization to ensure coherence and conciseness.

This approach is not directly outlined in the provided code, but you can implement it by extending the DocumentSummaryIndex class or by preprocessing the documents before they are passed to the summarization methods. The key is to ensure that each piece of text fed into the summarization process does not exceed the model's token limit.

For example, you can use the TokenTextSplitter class from the llama_index.text_splitter module to split your text into smaller chunks that are within the token limit. Here is an example of how to use the TokenTextSplitter:

from llama_index.text_splitter import TokenTextSplitter

# Your input text
text = "Your very long text..."

# Create a text splitter
text_splitter = TokenTextSplitter(chunk_size=100, chunk_overlap=0)

# Split the text into chunks
chunks = text_splitter.split_text(text)

# Now you can process each chunk separately
for chunk in chunks:
    # Your processing code here
    pass

Adjust the chunk_size and chunk_overlap parameters as needed for your specific use case. If your text is tokenized in a specific way, you can pass the tokenizer's encoding function to the TokenTextSplitter constructor.

For more details, you can refer to the following files in the LlamaIndex repository:

I hope this helps! If you have any other questions or need further clarification, feel free to ask.

Sources

#### About Dosu

This response is meant to be useful and save you time. It is **not meant to be a precise solution**, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot.

taimans-git commented 7 months ago

I have a similar problem. @sayanmndl21 have you found a solution? Thanks in advance :)

sayanmndl21 commented 7 months ago

Hey @taimans-git, unfortunately no. I have since then changed directions in favor of simpler methods like Retriever Query Engine which is performing better than expected.

I'd suggest custom retrievers (https://docs.llamaindex.ai/en/stable/examples/query_engine/CustomRetrievers/) and chromadb vector store if you are working with a large document base.

taimans-git commented 7 months ago

Hi @sayanmndl21, thanks for your suggestions. I tried different things and with other API models it seems to work. For example, for me it works with models from VertexAI or OpenAI. For MistralAI, I had to additionally set the context window to make it work, e.g.

Settings.context_window = 4096

run-llama / llama_index