[Question]: DocumentSummaryIndexEmbeddingRetriever and DocumentSummaryIndexLLMRetriever issue with response.metadata and response.source_nodes for queries on multiple document queries

Question Validation

[X] I have searched both the documentation and discord for an answer.

Question

I am following this code as is: https://docs.llamaindex.ai/en/stable/examples/index_structs/doc_summary/DocSummary/

When I try asking questions that will have information in different documents like "Tell me about history of Chicago and Mumbai", in this case while the response seems alright. But when I try to see the nodes that were used for getting the response by using response.source_nodes or even response.metadata, all the chunks/documents seem to be of only one city. Ideally, we should get back chunks from both the documents. Am I missing something?

What I am trying to do is build a document repository searching tool. Basically, I have large collection of documents and I want to create summaries of each one and save it in vector index. Then user will query about some required information and the tool's job is to search through all the documents and provide a list of top 'n' matches along with the justification for the match. I thought DocumentSummaryIndexLLMRetriever and DocumentSummaryIndexEmbeddingRetriever would be the best approach but this source_nodes/metadata issue is creating problem.

The issue you're encountering with DocumentSummaryIndexLLMRetriever and DocumentSummaryIndexEmbeddingRetriever focusing on only one city in the response.metadata and response.source_nodes, despite a query involving multiple topics, is likely due to the default settings of the choice_top_k and similarity_top_k parameters. These parameters are set to 1 by default, which means the retrievers will only select the top 1 most relevant summary node from the index based on the LLM's prediction or the embedding's similarity score, respectively.

To address this, you can adjust the choice_top_k for DocumentSummaryIndexLLMRetriever and similarity_top_k for DocumentSummaryIndexEmbeddingRetriever to a higher value. This adjustment will allow the retrieval of document chunks or summaries related to both topics in your query, such as both cities in the example you provided.

For more detailed control, especially if your queries vary widely in scope or complexity, consider using the VectorIndexAutoRetriever class. This class allows for dynamic setting of vector store query parameters, including similarity_top_k, based on the output of an LLM, potentially offering a more nuanced approach to retrieving relevant document summaries or chunks for complex queries.

You can find more details and adjust these parameters in the source code of the retrievers, which should help in getting responses that cover all queried topics more effectively.

Sources

#### About Dosu

This response is meant to be useful and save you time. It is **not meant to be a precise solution**, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot.

The issue you're encountering with DocumentSummaryIndexLLMRetriever and DocumentSummaryIndexEmbeddingRetriever focusing on only one city in the response.metadata and response.source_nodes, despite a query involving multiple topics, is likely due to the default settings of the choice_top_k and similarity_top_k parameters. These parameters are set to 1 by default, which means the retrievers will only select the top 1 most relevant summary node from the index based on the LLM's prediction or the embedding's similarity score, respectively.

To address this, you can adjust the choice_top_k for DocumentSummaryIndexLLMRetriever and similarity_top_k for DocumentSummaryIndexEmbeddingRetriever to a higher value. This adjustment will allow the retrieval of document chunks or summaries related to both topics in your query, such as both cities in the example you provided.

For more detailed control, especially if your queries vary widely in scope or complexity, consider using the VectorIndexAutoRetriever class. This class allows for dynamic setting of vector store query parameters, including similarity_top_k, based on the output of an LLM, potentially offering a more nuanced approach to retrieving relevant document summaries or chunks for complex queries.

You can find more details and adjust these parameters in the source code of the retrievers, which should help in getting responses that cover all queried topics more effectively.

Sources

llama-index-core/llama_index/core/indices/document_summary/retrievers.py

llama-index-core/llama_index/core/indices/vector_store/retrievers/auto_retriever/auto_retriever.py

About Dosu

Thanks @dosubot. Can you tell me the difference between DocumentSummaryIndexLLMRetriever and DocumentSummaryIndexEmbeddingRetriever?

The primary difference between DocumentSummaryIndexLLMRetriever and DocumentSummaryIndexEmbeddingRetriever lies in their retrieval mechanisms and the technologies they leverage:

DocumentSummaryIndexLLMRetriever utilizes Large Language Models (LLMs) to interpret queries and select the most relevant document summaries from an index. It sends queries and summary nodes to the LLM, which then identifies the most pertinent summaries based on its understanding of the query's context and content.
DocumentSummaryIndexEmbeddingRetriever, on the other hand, employs vector embeddings for both queries and document summaries to perform similarity searches. It generates or uses pre-existing embeddings to find the top-k most similar document summaries to a given query, based on vector space similarity measures.

The choice between these two depends on your specific needs:

If your application requires nuanced understanding of complex queries and the ability to interpret them in a human-like manner, the LLM-based DocumentSummaryIndexLLMRetriever might be more suitable. However, it could be slower and more resource-intensive due to the need for real-time LLM inference.
For applications prioritizing speed, scalability, and efficiency, especially with large datasets, DocumentSummaryIndexEmbeddingRetriever offers advantages. It leverages the speed of vector similarity searches, which can be highly optimized for performance.

Each retriever's effectiveness in a comprehensive document searching tool will depend on factors like query complexity, resource availability, and scalability requirements. Adjusting parameters such as choice_top_k for the LLM retriever and similarity_top_k for the embedding retriever can help in retrieving more relevant summaries across multiple topics, as seen in your query about cities.

#### Details

**Sources** - [llama-index-core/llama_index/core/indices/document_summary/retrievers.py](https://github.com/run-llama/llama_index/blob/main/llama-index-core/llama_index/core/indices/document_summary/retrievers.py) - [llama-index-legacy/llama_index/legacy/indices/document_summary/retrievers.py](https://github.com/run-llama/llama_index/blob/main/llama-index-legacy/llama_index/legacy/indices/document_summary/retrievers.py)
**About Dosu** This response is meant to be useful and save you time. It is **not meant to be a precise solution**, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

To continue the conversation, mention @dosu.

@dosu @dosubot thanks. For my use case, I am trying to build a file search RAG system. Basically I have a collection of documents, over which I apply a RAG system. Users will ask questions like "I need a document that talks about Llama 2", then the RAG system should be able to fetch top k documents (not chunks) that are relevant to the query, along with their document summary followed by a LLM generated justification of why this document was shortlisted for the given user query. I believe DocumentSummaryIndexLLMRetriever would be used in my case. @dosu @dosubot can you help with how I can implement this?

To build a file search system using a Retrieval-Augmented Generation (RAG) approach with DocumentSummaryIndexLLMRetriever, consider leveraging the FusedSummaryRetriever for enhanced retrieval. This approach combines document summaries with relevant fragments, improving context understanding and retrieval performance. Here's a streamlined guide:

Initialize Document Summary Index: Populate it with summaries of your documents to serve as the retrieval foundation.
Use FusedSummaryRetriever: This retriever enhances context by including both document summaries and relevant fragments. Initialize it with your vector store and document stores.

from llama_index.packs.docugami_kg_rag.helpers.fused_summary_retriever import FusedSummaryRetriever

fused_summary_retriever = FusedSummaryRetriever(
    vectorstore=my_vector_store,
    full_doc_summary_store=my_full_doc_summary_store,
    parent_doc_store=my_parent_doc_store,
)

Integrate SelfRAGQueryEngine: This engine processes queries, retrieves relevant documents, and generates justifications. It uses the FusedSummaryRetriever for dynamic retrieval based on query context.

from llama_index.packs.self_rag.base import SelfRAGQueryEngine

self_rag_query_engine = SelfRAGQueryEngine(
    model_path="path_to_your_model",
    retriever=fused_summary_retriever,
)

Perform Searches: Process user queries to retrieve relevant documents and generate justifications with the custom_query method of the SelfRAGQueryEngine.
Optimize for Relevance and Efficiency: Experiment with retrieval settings and consider fine-tuning the embedding model to improve retrieval quality. Adjust parameters like temperature, top_p, and max_tokens in the SelfRAGQueryEngine for optimal performance.

By following these steps and utilizing the FusedSummaryRetriever within the SelfRAGQueryEngine, you can create a sophisticated file search system that efficiently retrieves relevant documents and provides justifications for their selection. For detailed implementation, refer to the LlamaIndex documentation.

#### Details

**Sources** - [[Question]: Pipeline for document Summary + recursive retrieval](https://github.com/run-llama/llama_index/issues/12197) - [[Question]: Approaches for better RAG pipeline](https://github.com/run-llama/llama_index/issues/12073) - [[Question]: How to do Multi-Document RAG with Document Summary Index?](https://github.com/run-llama/llama_index/issues/10389) - [feat(context length): QnA Summarization as a relevant information extractor](https://github.com/run-llama/llama_index/issues/6889) - [llama-index-legacy/llama_index/legacy/indices/document_summary/retrievers.py](https://github.com/run-llama/llama_index/blob/main/llama-index-legacy/llama_index/legacy/indices/document_summary/retrievers.py) - [docs/docs/optimizing/production_rag.md](https://github.com/run-llama/llama_index/blob/main/docs/docs/optimizing/production_rag.md) - [llama-index-core/llama_index/core/indices/document_summary/retrievers.py](https://github.com/run-llama/llama_index/blob/main/llama-index-core/llama_index/core/indices/document_summary/retrievers.py) - [llama-index-packs/llama-index-packs-self-rag/llama_index/packs/self_rag/base.py](https://github.com/run-llama/llama_index/blob/main/llama-index-packs/llama-index-packs-self-rag/llama_index/packs/self_rag/base.py) - [llama-index-packs/llama-index-packs-docugami-kg-rag/llama_index/packs/docugami_kg_rag/helpers/fused_summary_retriever.py](https://github.com/run-llama/llama_index/blob/main/llama-index-packs/llama-index-packs-docugami-kg-rag/llama_index/packs/docugami_kg_rag/helpers/fused_summary_retriever.py) - [docs/docs/examples/retrievers/auto_vs_recursive_retriever.ipynb](https://github.com/run-llama/llama_index/blob/main/docs/docs/examples/retrievers/auto_vs_recursive_retriever.ipynb) - [docs/docs/module_guides/querying/retriever/retrievers.md](https://github.com/run-llama/llama_index/blob/main/docs/docs/module_guides/querying/retriever/retrievers.md) - [docs/docs/optimizing/advanced_retrieval/advanced_retrieval.md](https://github.com/run-llama/llama_index/blob/main/docs/docs/optimizing/advanced_retrieval/advanced_retrieval.md) - [docs/docs/examples/finetuning/embeddings/finetune_embedding.ipynb](https://github.com/run-llama/llama_index/blob/main/docs/docs/examples/finetuning/embeddings/finetune_embedding.ipynb) - [docs/docs/use_cases/fine_tuning.md](https://github.com/run-llama/llama_index/blob/main/docs/docs/use_cases/fine_tuning.md) - [docs/docs/optimizing/fine-tuning/fine-tuning.md](https://github.com/run-llama/llama_index/blob/main/docs/docs/optimizing/fine-tuning/fine-tuning.md) - [docs/docs/examples/finetuning/embeddings/finetune_embedding_adapter.ipynb](https://github.com/run-llama/llama_index/blob/main/docs/docs/examples/finetuning/embeddings/finetune_embedding_adapter.ipynb) - [llama-index-finetuning/llama_index/finetuning/embeddings/sentence_transformer.py](https://github.com/run-llama/llama_index/blob/main/llama-index-finetuning/llama_index/finetuning/embeddings/sentence_transformer.py)
**About Dosu** This response is meant to be useful and save you time. It is **not meant to be a precise solution**, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

To continue the conversation, mention @dosu.

run-llama / llama_index