run-llama / llama_index

LlamaIndex is a data framework for your LLM applications
https://docs.llamaindex.ai
MIT License
36.75k stars 5.27k forks source link

[Question]: DocumentSummaryIndexEmbeddingRetriever and DocumentSummaryIndexLLMRetriever issue with response.metadata and response.source_nodes for queries on multiple document queries #12960

Closed AI-ANK closed 6 months ago

AI-ANK commented 7 months ago

Question Validation

Question

I am following this code as is: https://docs.llamaindex.ai/en/stable/examples/index_structs/doc_summary/DocSummary/

When I try asking questions that will have information in different documents like "Tell me about history of Chicago and Mumbai", in this case while the response seems alright. But when I try to see the nodes that were used for getting the response by using response.source_nodes or even response.metadata, all the chunks/documents seem to be of only one city. Ideally, we should get back chunks from both the documents. Am I missing something?

What I am trying to do is build a document repository searching tool. Basically, I have large collection of documents and I want to create summaries of each one and save it in vector index. Then user will query about some required information and the tool's job is to search through all the documents and provide a list of top 'n' matches along with the justification for the match. I thought DocumentSummaryIndexLLMRetriever and DocumentSummaryIndexEmbeddingRetriever would be the best approach but this source_nodes/metadata issue is creating problem.

dosubot[bot] commented 7 months ago

The issue you're encountering with DocumentSummaryIndexLLMRetriever and DocumentSummaryIndexEmbeddingRetriever focusing on only one city in the response.metadata and response.source_nodes, despite a query involving multiple topics, is likely due to the default settings of the choice_top_k and similarity_top_k parameters. These parameters are set to 1 by default, which means the retrievers will only select the top 1 most relevant summary node from the index based on the LLM's prediction or the embedding's similarity score, respectively.

To address this, you can adjust the choice_top_k for DocumentSummaryIndexLLMRetriever and similarity_top_k for DocumentSummaryIndexEmbeddingRetriever to a higher value. This adjustment will allow the retrieval of document chunks or summaries related to both topics in your query, such as both cities in the example you provided.

For more detailed control, especially if your queries vary widely in scope or complexity, consider using the VectorIndexAutoRetriever class. This class allows for dynamic setting of vector store query parameters, including similarity_top_k, based on the output of an LLM, potentially offering a more nuanced approach to retrieving relevant document summaries or chunks for complex queries.

You can find more details and adjust these parameters in the source code of the retrievers, which should help in getting responses that cover all queried topics more effectively.

Sources

#### About Dosu This response is meant to be useful and save you time. It is **not meant to be a precise solution**, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot.
AI-ANK commented 6 months ago

The issue you're encountering with DocumentSummaryIndexLLMRetriever and DocumentSummaryIndexEmbeddingRetriever focusing on only one city in the response.metadata and response.source_nodes, despite a query involving multiple topics, is likely due to the default settings of the choice_top_k and similarity_top_k parameters. These parameters are set to 1 by default, which means the retrievers will only select the top 1 most relevant summary node from the index based on the LLM's prediction or the embedding's similarity score, respectively.

To address this, you can adjust the choice_top_k for DocumentSummaryIndexLLMRetriever and similarity_top_k for DocumentSummaryIndexEmbeddingRetriever to a higher value. This adjustment will allow the retrieval of document chunks or summaries related to both topics in your query, such as both cities in the example you provided.

For more detailed control, especially if your queries vary widely in scope or complexity, consider using the VectorIndexAutoRetriever class. This class allows for dynamic setting of vector store query parameters, including similarity_top_k, based on the output of an LLM, potentially offering a more nuanced approach to retrieving relevant document summaries or chunks for complex queries.

You can find more details and adjust these parameters in the source code of the retrievers, which should help in getting responses that cover all queried topics more effectively.

Sources

About Dosu

Thanks @dosubot. Can you tell me the difference between DocumentSummaryIndexLLMRetriever and DocumentSummaryIndexEmbeddingRetriever?

dosubot[bot] commented 6 months ago

The primary difference between DocumentSummaryIndexLLMRetriever and DocumentSummaryIndexEmbeddingRetriever lies in their retrieval mechanisms and the technologies they leverage:

The choice between these two depends on your specific needs:

Each retriever's effectiveness in a comprehensive document searching tool will depend on factors like query complexity, resource availability, and scalability requirements. Adjusting parameters such as choice_top_k for the LLM retriever and similarity_top_k for the embedding retriever can help in retrieving more relevant summaries across multiple topics, as seen in your query about cities.

#### Details **Sources** - [llama-index-core/llama_index/core/indices/document_summary/retrievers.py](https://github.com/run-llama/llama_index/blob/main/llama-index-core/llama_index/core/indices/document_summary/retrievers.py) - [llama-index-legacy/llama_index/legacy/indices/document_summary/retrievers.py](https://github.com/run-llama/llama_index/blob/main/llama-index-legacy/llama_index/legacy/indices/document_summary/retrievers.py)
**About Dosu** This response is meant to be useful and save you time. It is **not meant to be a precise solution**, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

To continue the conversation, mention @dosu.

AI-ANK commented 6 months ago

@dosu @dosubot thanks. For my use case, I am trying to build a file search RAG system. Basically I have a collection of documents, over which I apply a RAG system. Users will ask questions like "I need a document that talks about Llama 2", then the RAG system should be able to fetch top k documents (not chunks) that are relevant to the query, along with their document summary followed by a LLM generated justification of why this document was shortlisted for the given user query. I believe DocumentSummaryIndexLLMRetriever would be used in my case. @dosu @dosubot can you help with how I can implement this?

dosubot[bot] commented 6 months ago

To build a file search system using a Retrieval-Augmented Generation (RAG) approach with DocumentSummaryIndexLLMRetriever, consider leveraging the FusedSummaryRetriever for enhanced retrieval. This approach combines document summaries with relevant fragments, improving context understanding and retrieval performance. Here's a streamlined guide:

  1. Initialize Document Summary Index: Populate it with summaries of your documents to serve as the retrieval foundation.

  2. Use FusedSummaryRetriever: This retriever enhances context by including both document summaries and relevant fragments. Initialize it with your vector store and document stores.

from llama_index.packs.docugami_kg_rag.helpers.fused_summary_retriever import FusedSummaryRetriever

fused_summary_retriever = FusedSummaryRetriever(
    vectorstore=my_vector_store,
    full_doc_summary_store=my_full_doc_summary_store,
    parent_doc_store=my_parent_doc_store,
)
  1. Integrate SelfRAGQueryEngine: This engine processes queries, retrieves relevant documents, and generates justifications. It uses the FusedSummaryRetriever for dynamic retrieval based on query context.
from llama_index.packs.self_rag.base import SelfRAGQueryEngine

self_rag_query_engine = SelfRAGQueryEngine(
    model_path="path_to_your_model",
    retriever=fused_summary_retriever,
)
  1. Perform Searches: Process user queries to retrieve relevant documents and generate justifications with the custom_query method of the SelfRAGQueryEngine.

  2. Optimize for Relevance and Efficiency: Experiment with retrieval settings and consider fine-tuning the embedding model to improve retrieval quality. Adjust parameters like temperature, top_p, and max_tokens in the SelfRAGQueryEngine for optimal performance.

By following these steps and utilizing the FusedSummaryRetriever within the SelfRAGQueryEngine, you can create a sophisticated file search system that efficiently retrieves relevant documents and provides justifications for their selection. For detailed implementation, refer to the LlamaIndex documentation.

#### Details **Sources** - [[Question]: Pipeline for document Summary + recursive retrieval](https://github.com/run-llama/llama_index/issues/12197) - [[Question]: Approaches for better RAG pipeline](https://github.com/run-llama/llama_index/issues/12073) - [[Question]: How to do Multi-Document RAG with Document Summary Index?](https://github.com/run-llama/llama_index/issues/10389) - [feat(context length): QnA Summarization as a relevant information extractor](https://github.com/run-llama/llama_index/issues/6889) - [llama-index-legacy/llama_index/legacy/indices/document_summary/retrievers.py](https://github.com/run-llama/llama_index/blob/main/llama-index-legacy/llama_index/legacy/indices/document_summary/retrievers.py) - [docs/docs/optimizing/production_rag.md](https://github.com/run-llama/llama_index/blob/main/docs/docs/optimizing/production_rag.md) - [llama-index-core/llama_index/core/indices/document_summary/retrievers.py](https://github.com/run-llama/llama_index/blob/main/llama-index-core/llama_index/core/indices/document_summary/retrievers.py) - [llama-index-packs/llama-index-packs-self-rag/llama_index/packs/self_rag/base.py](https://github.com/run-llama/llama_index/blob/main/llama-index-packs/llama-index-packs-self-rag/llama_index/packs/self_rag/base.py) - [llama-index-packs/llama-index-packs-docugami-kg-rag/llama_index/packs/docugami_kg_rag/helpers/fused_summary_retriever.py](https://github.com/run-llama/llama_index/blob/main/llama-index-packs/llama-index-packs-docugami-kg-rag/llama_index/packs/docugami_kg_rag/helpers/fused_summary_retriever.py) - [docs/docs/examples/retrievers/auto_vs_recursive_retriever.ipynb](https://github.com/run-llama/llama_index/blob/main/docs/docs/examples/retrievers/auto_vs_recursive_retriever.ipynb) - [docs/docs/module_guides/querying/retriever/retrievers.md](https://github.com/run-llama/llama_index/blob/main/docs/docs/module_guides/querying/retriever/retrievers.md) - [docs/docs/optimizing/advanced_retrieval/advanced_retrieval.md](https://github.com/run-llama/llama_index/blob/main/docs/docs/optimizing/advanced_retrieval/advanced_retrieval.md) - [docs/docs/examples/finetuning/embeddings/finetune_embedding.ipynb](https://github.com/run-llama/llama_index/blob/main/docs/docs/examples/finetuning/embeddings/finetune_embedding.ipynb) - [docs/docs/use_cases/fine_tuning.md](https://github.com/run-llama/llama_index/blob/main/docs/docs/use_cases/fine_tuning.md) - [docs/docs/optimizing/fine-tuning/fine-tuning.md](https://github.com/run-llama/llama_index/blob/main/docs/docs/optimizing/fine-tuning/fine-tuning.md) - [docs/docs/examples/finetuning/embeddings/finetune_embedding_adapter.ipynb](https://github.com/run-llama/llama_index/blob/main/docs/docs/examples/finetuning/embeddings/finetune_embedding_adapter.ipynb) - [llama-index-finetuning/llama_index/finetuning/embeddings/sentence_transformer.py](https://github.com/run-llama/llama_index/blob/main/llama-index-finetuning/llama_index/finetuning/embeddings/sentence_transformer.py)
**About Dosu** This response is meant to be useful and save you time. It is **not meant to be a precise solution**, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

To continue the conversation, mention @dosu.