[Bug]: Truncate method for too many nodes drops info dramatically

qiisziilbash commented 7 months ago

Bug Description

When using a query engine with a response_synthesizer of simple_summarize response mode with too many nodes, truncate method truncates each node to very few tokens that theoretically no info is preserved (and if number of nodes are sufficiently big enough it crashes due to negative chunk size).

A better behavior would be reducing number of nodes that chunks can fit into the context window or combination of this and existing behavior. ****

Version

latest

Steps to Reproduce

make a query engine using an index that has sufficiently big enough chunks and ask a question

similarity_top_k = a very big number
index.as_query_engine(
        similarity_top_k=similarity_top_k,
        response_synthesizer=get_response_synthesizer(
            response_mode="simple_summarize",
        ),
    )

Relevant Logs/Tracbacks

No response

dosubot[bot] commented 7 months ago

Hey there @qiisziilbash! Fancy seeing you here again. 🚀 Looks like you've stumbled upon another quirky challenge in the LlamaIndex universe. Hold tight, I'm diving into the abyss of code to fetch some answers for you. Will circle back shortly!

logan-markewich commented 7 months ago

@qiisziilbash this module is called simple summarize because its very simple :)

If you need to summarize across variable number of nodes, I recommend using tree_summarize

dosubot[bot] commented 7 months ago

After reviewing the details of the bug you've reported and considering the feedback from the maintainer, it seems the issue you're encountering with the truncate method is indeed a limitation of the simple_summarize response mode in the context of handling a large number of nodes. The behavior you're experiencing, where information is dramatically dropped or even leads to crashes due to negative chunk size, is a consequence of the simplistic approach taken by this summarization method.

The maintainer, Logan Markewich, suggests using the tree_summarize response mode as an alternative. This mode is designed to handle summarization across a variable number of nodes more effectively. It seems to offer a more sophisticated approach to managing large datasets, potentially avoiding the issues you're facing with the simple_summarize mode.

Given this, it might be beneficial to explore the tree_summarize mode for your use case. This could involve adjusting your query engine configuration to utilize tree_summarize instead of simple_summarize. Here's a rough idea of how you might adjust your code:

index.as_query_engine(
    similarity_top_k=similarity_top_k,
    response_synthesizer=get_response_synthesizer(
        response_mode="tree_summarize",
    ),
)

This change should help in managing large numbers of nodes more gracefully, potentially avoiding the truncation issue you've encountered. If you have further questions or need assistance with implementing this solution, feel free to reach out. Dosu is here to help!

#### About Dosu

This response is meant to be useful and save you time. It is **not meant to be a precise solution**, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot.

qiisziilbash commented 7 months ago

totally understand; would not truncating the chunks list be simpler tho :D ? And not producing absurd results :)

logan-markewich commented 7 months ago

@qiisziilbash probably these two steps should just be swapped in the code

truncated_chunks = self._service_context.prompt_helper.truncate(
            prompt=text_qa_template,
            text_chunks=text_chunks,
        )
node_text = "\n".join(truncated_chunks)

Merge first, then truncate

qiisziilbash commented 7 months ago

agreed

scriptator commented 1 month ago

I agree that it makes more sense to merge first and then truncate, but the merged solution introduced another problem: truncated_chunks now contains a list with a single string in it instead of just the string, so when you pass that on to the LLM class the context_str gets formatted like this ['{truncated_str}'].

run-llama / llama_index