[Question]: The retriever failed to fetch the relevant info from chromadb

Question Validation

[X] I have searched both the documentation and discord for an answer.

Question

Hi, All. I loaded an stored the table into chromadb. The table contains four columns ["company_name", "country", "city", "description"]. After that, I query the llm with the query "Do you know the company XXX ?". Though, the retriever failed to fetch the document with exact the same company name. I did use the reranking service to improve the quality but it did not work given that the retriever had failed to retrieve relevant documents in the returned documents. I want to know how I can improve the overall performance of the RAG application.

Here is the code I used to do the whole thing. Some of them are abstract. Please let me know if you need more detail to solve this this issue.

query = "Do you know the company XXX?"
database_engine = connect_engine()
data_reader = DatabaseReader(database_engine)
documents = data_reader.load_data(query)
storage_context = StorageContext.from_defaults(vector_store=vector_store)
VectorStoreIndex.from_documents(
        documents,
        embed_model=HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5"),
        show_progress=show_progress,
        storage_context=storage_context,
    )

database = chromadb.HttpClient(port=8080)
company = database.get_or_create_collection("company")
vector_store = ChromaVectorStore(chroma_collection=company)
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex.from_vector_store(
        vector_store,
        storage_context=storage_context,
        embed_model=HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5"),
    )
retriever = index.as_retriever(similarity_top_k=number_nodes)
retrieved_documents = retriever.retrieve(query)
reranked_documents = rerank_documents(
        retrieved_documents, query, n_reranked_documents
    )
response_synthesizer = get_response_synthesizer(llm=model, response_mode="compact")
response = response_synthesizer.synthesize(query, nodes=documents)

To improve the overall performance of your RAG application, especially when the retriever fails to fetch the document with the exact company name, you can consider the following techniques:

Decoupling Chunks Used for Retrieval vs. Chunks Used for Synthesis:
- Embed a document summary, which links to chunks associated with the document: This helps in retrieving relevant documents at a high level before retrieving specific chunks.
- Embed a sentence, which then links to a window around the sentence: This allows for finer-grained retrieval of relevant context.
```
from llama_index import DocumentSummaryIndex

# Create a document summary index
doc_summary_index = DocumentSummaryIndex(documents)

# Retrieve relevant documents using the summary index
relevant_docs = doc_summary_index.retrieve(query)
```
Structured Retrieval for Larger Document Sets:
- Metadata Filters + Auto Retrieval: Tag each document with metadata and use these tags to filter documents during retrieval.
- Store Document Hierarchies (summaries -> raw chunks) + Recursive Retrieval: Embed document summaries and map to chunks per document, fetching at the document level first.
```
from llama_index import MetadataFilter

# Tag documents with metadata
metadata_filter = MetadataFilter(documents, tags)

# Use metadata filter to retrieve documents
filtered_docs = metadata_filter.retrieve(query)
```

Dynamically Retrieve Chunks Depending on your Task:

Use task-specific retrieval techniques to handle different types of queries such as summarization or comparisons.

from llama_index import RouterQueryEngine

# Initialize the router query engine
router_engine = RouterQueryEngine()

# Perform task-specific retrieval
result = router_engine.query(query, task_type="summarization")

Optimize Context Embeddings:

Fine-tune the embedding model to better capture the salient properties of your data corpus.

from llama_index import EmbeddingFineTuner

# Fine-tune the embedding model
fine_tuner = EmbeddingFineTuner()
fine_tuned_model = fine_tuner.fine_tune(corpus)

Implement Corrective RAG Techniques:

Evaluate Relevancy: Use a relevancy pipeline to filter out irrelevant documents.
Transform Queries: Refine the query to improve search performance if initial retrievals are not satisfactory.

from llama_index.core.prompts import PromptTemplate
from llama_index.core.query_pipeline.query import QueryPipeline
from llama_index.llms.openai import OpenAI

DEFAULT_RELEVANCY_PROMPT_TEMPLATE = PromptTemplate(
    template="""As a grader, your task is to evaluate the relevance of a document retrieved in response to a user's question.

    Retrieved Document:
    -------------------
    {context_str}

    User Question:
    --------------
    {query_str}

    Evaluation Criteria:
    - Consider whether the document contains keywords or topics related to the user's question.
    - The evaluation should not be overly stringent; the primary objective is to identify and filter out clearly irrelevant retrievals.

    Decision:
    - Assign a binary score to indicate the document's relevance.
    - Use 'yes' if the document is relevant to the question, or 'no' if it is not.

    Please provide your binary score ('yes' or 'no') below to indicate the document's relevance to the user question."""
)

DEFAULT_TRANSFORM_QUERY_TEMPLATE = PromptTemplate(
    template="""Your task is to refine a query to ensure it is highly effective for retrieving relevant search results. \n
    Analyze the given input to grasp the core semantic intent or meaning. \n
    Original Query:
    \n ------- \n
    {query_str}
    \n ------- \n
    Your goal is to rephrase or enhance this query to improve its search performance. Ensure the revised query is concise and directly aligned with the intended search objective. \n
    Respond with the optimized query only:"""
)

class CorrectiveRAGPack(BaseLlamaPack):
    def __init__(self, documents: List[Document], tavily_ai_apikey: str) -> None:
        llm = OpenAI(model="gpt-4")
        self.relevancy_pipeline = QueryPipeline(
            chain=[DEFAULT_RELEVANCY_PROMPT_TEMPLATE, llm]
        )
        self.transform_query_pipeline = QueryPipeline(
            chain=[DEFAULT_TRANSFORM_QUERY_TEMPLATE, llm]
        )

        self.llm = llm
        self.index = VectorStoreIndex.from_documents(documents)
        self.tavily_tool = TavilyToolSpec(api_key=tavily_ai_apikey)

    def retrieve_nodes(self, query_str: str, **kwargs: Any) -> List[NodeWithScore]:
        retriever = self.index.as_retriever(**kwargs)
        return retriever.retrieve(query_str)

    def evaluate_relevancy(self, retrieved_nodes: List[Document], query_str: str) -> List[str]:
        relevancy_results = []
        for node in retrieved_nodes:
            relevancy = self.relevancy_pipeline.run(
                context_str=node.text, query_str=query_str
            )
            relevancy_results.append(relevancy.message.content.lower().strip())
        return relevancy_results

    def extract_relevant_texts(self, retrieved_nodes: List[NodeWithScore], relevancy_results: List[str]) -> str:
        relevant_texts = [
            retrieved_nodes[i].text
            for i, result in enumerate(relevancy_results)
            if result == "yes"
        ]
        return "\n".join(relevant_texts)

    def search_with_transformed_query(self, query_str: str) -> str:
        search_results = self.tavily_tool.search(query_str, max_results=5)
        return "\n".join([result.text for result in search_results])

    def get_result(self, relevant_text: str, search_text: str, query_str: str) -> Any:
        documents = [Document(text=relevant_text + "\n" + search_text)]
        index = SummaryIndex.from_documents(documents)
        query_engine = index.as_query_engine()
        return query_engine.query(query_str)

    def run(self, query_str: str, **kwargs: Any) -> Any:
        retrieved_nodes = self.retrieve_nodes(query_str, **kwargs)
        relevancy_results = self.evaluate_relevancy(retrieved_nodes, query_str)
        relevant_text = self.extract_relevant_texts(retrieved_nodes, relevancy_results)
        search_text = ""

        if "no" in relevancy_results:
            transformed_query_str = self.transform_query_pipeline.run(
                query_str=query_str
            ).message.content
            search_text = self.search_with_transformed_query(transformed_query_str)

        if search_text:
            return self.get_result(relevant_text, search_text, query_str)
        else:
            return self.get_result(relevant_text, "", query_str)

Utilize RAGatouille Retriever Pack:

RAGatouille Retriever Pack: This pack uses the RAGPretrainedModel for indexing and retrieval, which can be more effective in handling complex queries and large document sets.

from llama_index.packs.ragatouille_retriever.base import RAGatouilleRetrieverPack

# Initialize the RAGatouille Retriever Pack
retriever_pack = RAGatouilleRetrieverPack(documents)

# Add documents to the retriever
retriever_pack.add_documents(new_documents)

# Run the retriever with a query
result = retriever_pack.run(query)

By implementing these techniques, you can enhance the retrieval accuracy and overall performance of your RAG application ^[1]^[2]^[3].

To continue talking to Dosu, mention @dosu.

May you explain the first suggestion like I am 5?

Sure! Let's break it down into very simple terms:

Decoupling Chunks for Retrieval and Synthesis:
- Imagine you have a big book. Instead of looking through the whole book every time you want to find something, you make a short summary of each chapter.
- When you want to find something, you first look at the summaries to see which chapter might have the information you need.
- Once you find the right chapter, you then look at the details in that chapter.
Embedding a Document Summary:
- Think of the summary as a quick overview that tells you what the chapter is about.
- This summary is linked to the detailed parts of the chapter, so you know exactly where to look for more information.

In your code, you can create a summary index to help with this:

from llama_index import DocumentSummaryIndex

# Create a document summary index
doc_summary_index = DocumentSummaryIndex(documents)

# Retrieve relevant documents using the summary index
relevant_docs = doc_summary_index.retrieve(query)

This way, you first find the right documents using the summaries, and then dive into the details only when needed. This makes finding the right information faster and easier!

To continue talking to Dosu, mention @dosu.

According to your message, can I understand that the whole thing is we embedding the summary of documents instead of documents itself. While we try to generate a response, we fetch the context based on the summary still but generate the context based on the full documents. If this is the case, how can I rebuild the DocumentSummaryIndex from the chroma db?

Yes, you are correct. The idea is to embed summaries of documents for efficient retrieval and then use the full documents for generating detailed responses. To rebuild the DocumentSummaryIndex from Chroma DB, you can follow these steps:

Store the summaries and their associated metadata in Chroma DB.
Retrieve the summaries and metadata from Chroma DB.
Rebuild the DocumentSummaryIndex using the retrieved summaries and metadata.

Here is an example of how you can achieve this:

Storing Summaries in Chroma DB

First, ensure you store the summaries and their metadata in Chroma DB:

from llama_index.vector_stores.chroma import ChromaVectorStore
from llama_index.core import StorageContext
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
import chromadb

# Assuming 'summaries' is a list of summary documents
summaries = [...]  # Your summary documents here

# Initialize ChromaDB client and collection
chroma_client = chromadb.HttpClient(host="your_chromadb_host", port="your_chromadb_port", ssl=False)
chroma_collection = chroma_client.get_or_create_collection("summary_collection")

# Define your embedding model
embed_model = HuggingFaceEmbedding(model_name="your_model_name_here")

# Initialize ChromaVectorStore with the collection
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)

# Set up StorageContext with the initialized vector store
storage_context = StorageContext.from_defaults(vector_store=vector_store)

# Store summaries in ChromaDB
for summary in summaries:
    vector_store.add_document(summary)

Rebuilding the `DocumentSummaryIndex` from Chroma DB

Next, retrieve the summaries and metadata from Chroma DB and rebuild the DocumentSummaryIndex:

from llama_index.legacy.indices.document_summary.base import DocumentSummaryIndex
from llama_index.core.schema import BaseNode

# Retrieve summaries from ChromaDB
retrieved_summaries = vector_store.get_all_documents()

# Create an instance of DocumentSummaryIndex
document_summary_index = DocumentSummaryIndex()

# Add retrieved summaries to the index
for summary in retrieved_summaries:
    summary_node = BaseNode(node_id=summary['id'], ref_doc_id=summary['ref_doc_id'])
    nodes = [BaseNode(node_id=node_id) for node_id in summary['node_ids']]
    document_summary_index.add_summary_and_nodes(summary_node, nodes)

# Now you can use the document_summary_index for retrieval
query = "Do you know the company XXX?"
relevant_docs = document_summary_index.retrieve(query)

This approach ensures that you can efficiently store, retrieve, and use document summaries for retrieval while maintaining the ability to generate detailed responses based on the full documents. Make sure to replace placeholders like "your_chromadb_host", "your_chromadb_port", and "your_model_name_here" with your actual configurations.

To continue talking to Dosu, mention @dosu.

Great! I see the whole process of it. However, I don't see the point that how decoupling the retrieval and synthesis can help me to retrieve the right documents from the database. If the retriever has already failed to retrieved the right document given the full document list, will it be easier to retrieve the right documents based on only the summary of documents? May you elaborate the logic behind it?

Decoupling retrieval and synthesis by using document summaries can indeed help improve the accuracy of retrieving the right documents. Here's the logic behind it:

Focused Retrieval:
- Summaries Provide Context: Summaries encapsulate the key points of a document, making it easier for the retriever to match the query with the most relevant documents. This is especially useful when the full document contains a lot of information that might dilute the relevance of the query.
- Reduced Noise: By focusing on summaries, the retriever deals with less noise and more concentrated information, increasing the chances of retrieving the right documents.
Efficiency:
- Faster Retrieval: Summaries are shorter and more concise, allowing the retriever to process and compare them more quickly than full documents. This can lead to faster and more accurate retrievals.
- Hierarchical Retrieval: By first retrieving summaries, you can quickly narrow down the most relevant documents and then dive deeper into the full documents for synthesis. This two-step process is more efficient than directly working with full documents.
Improved Relevance:
- High-Level Matching: Summaries often highlight the most important aspects of a document, which are more likely to match the query's intent. This high-level matching can be more effective than matching against the entire document content.
- Metadata Utilization: Summaries can include metadata that provides additional context for retrieval, such as tags, categories, or keywords, further improving the accuracy of the retrieval process.

Example Scenario

Imagine you have a database of company profiles, each with detailed information. If you query "Do you know the company XXX?", the retriever might struggle to find the exact match due to the volume of information in each profile. However, if each profile has a summary that highlights the company name, location, and key details, the retriever can more easily match the query to the correct summary.

Implementation Steps

Create Summaries:
- Generate summaries for each document that capture the essential information.
Store Summaries in Chroma DB:
- Store these summaries along with their metadata in Chroma DB.
Retrieve Using Summaries:
- Use the summaries for initial retrieval to quickly find the most relevant documents.
Detailed Synthesis:
- Once the relevant summaries are retrieved, use the full documents for detailed synthesis to generate the final response.

By following this approach, you can improve the accuracy and efficiency of your retrieval process, ensuring that the right documents are retrieved and used for generating responses.

For more details on how to implement this, you can refer to the previous examples provided on storing and retrieving summaries using Chroma DB and rebuilding the DocumentSummaryIndex ^[1]^[2]^[3]^[4].

To continue talking to Dosu, mention @dosu.

Okay. It seems that the first suggestion does not work. May you elaborate your second suggestion in detail?

Sure! Let's dive into the second suggestion: Structured Retrieval for Larger Document Sets. This involves using metadata filters and auto retrieval, as well as storing document hierarchies (summaries -> raw chunks) and performing recursive retrieval. Here's a detailed breakdown:

1. Metadata Filters and Auto Retrieval

Metadata Filters:

Tagging Documents: Each document is tagged with metadata (e.g., company name, country, city). This metadata helps in filtering documents during retrieval.
Using Filters: When querying, you can use these metadata tags to narrow down the search to the most relevant documents.

Auto Retrieval:

Automatic Retrieval: The system automatically retrieves documents based on the metadata and the query, improving the efficiency and accuracy of the retrieval process.

Implementation Steps

Tag Documents with Metadata:
- Ensure each document has relevant metadata tags.
Initialize Vector Stores for Metadata and Document Chunks:
- Use a vector store to manage the metadata and another for the document chunks.
Setup Auto Retriever:
- Configure an auto retriever to use the metadata for filtering and retrieving documents.

Example Code

import weaviate
from llama_index import VectorStoreIndex, StorageContext, MetadataFilter, VectorIndexAutoRetriever
from llama_index.core.schema import Document, BaseNode
from llama_index.vector_stores.weaviate import WeaviateVectorStore

# Initialize Weaviate client
client = weaviate.Client("http://localhost:8080")

# Define documents and metadata
documents = [...]  # List of Document objects
metadata_nodes = [...]  # List of BaseNode objects with metadata

# Initialize vector store for metadata
metadata_store = WeaviateVectorStore(weaviate_client=client, index_name="metadata_index")
metadata_sc = StorageContext.from_defaults(vector_store=metadata_store)
metadata_index = VectorStoreIndex(metadata_nodes, storage_context=metadata_sc)

# Initialize vector store for document chunks
chunks_store = WeaviateVectorStore(weaviate_client=client, index_name="chunks_index")
chunks_sc = StorageContext.from_defaults(vector_store=chunks_store)
doc_index = VectorStoreIndex.from_documents(documents, storage_context=chunks_sc)

# Setup auto retriever
auto_retriever = VectorIndexAutoRetriever(
    index=metadata_index,
    vector_store_info={"metadata": "info"},
    similarity_top_k=10
)

# Define per-document retrievers
retriever_dict = {}
for doc in documents:
    index_id = doc.metadata["index_id"]
    filters = MetadataFilter(filters=[{"key": "index_id", "value": index_id}])
    retriever = doc_index.as_retriever(filters=filters)
    retriever_dict[index_id] = retriever

# Setup recursive retriever
recursive_retriever = RecursiveRetriever(
    "vector",
    retriever_dict={"vector": auto_retriever, **retriever_dict},
    verbose=True
)

# Query engine
query_engine = RetrieverQueryEngine.from_args(recursive_retriever, llm=OpenAI(model="gpt-3.5-turbo"))

# Run query
query = "Do you know the company XXX?"
result = query_engine.query(query)

2. Storing Document Hierarchies and Recursive Retrieval

Document Hierarchies:

Summaries and Raw Chunks: Store summaries of documents and link them to the raw chunks. This helps in hierarchical retrieval, where you first retrieve summaries and then the detailed chunks.

Recursive Retrieval:

Hierarchical Retrieval: Perform retrieval in a hierarchical manner, starting with summaries and then diving into the detailed chunks.

Example Code

from llama_index.core.retrievers import RecursiveRetriever
from llama_index.core.query_engine import RetrieverQueryEngine
from llama_index.llms.openai import OpenAI

# Define Local Index State
class LocalIndexState:
    def __init__(self):
        self.full_doc_summaries_by_id = {}
        self.chunks_by_id = {}
        self.retrieval_tool_function_name = "search_documents"
        self.retrieval_tool_description = "Searches for and returns document chunks."

# Create document summaries and chunks
summaries = [...]  # List of summary documents
chunks = [...]  # List of document chunks

# Store summaries and chunks in the state
state = LocalIndexState()
for summary in summaries:
    state.full_doc_summaries_by_id[summary.id] = summary
for chunk in chunks:
    state.chunks_by_id[chunk.id] = chunk

# Setup Fused Summary Retriever
class FusedSummaryRetriever(BaseRetriever):
    def __init__(self, vectorstore, full_doc_summary_store, parent_doc_store):
        self.vectorstore = vectorstore
        self.full_doc_summary_store = full_doc_summary_store
        self.parent_doc_store = parent_doc_store

    def _retrieve(self, query_bundle):
        # Implement retrieval logic
        pass

# Initialize retriever
retriever = FusedSummaryRetriever(
    vectorstore=chunks_store,
    full_doc_summary_store=state.full_doc_summaries_by_id,
    parent_doc_store=state.chunks_by_id
)

# Setup query engine
query_engine = RetrieverQueryEngine(retriever=retriever)

# Run query
query = "Do you know the company XXX?"
result = query_engine.query(query)

By following these steps, you can implement structured retrieval using metadata filters and auto retrieval, store document hierarchies, and perform recursive retrieval to enhance the performance of your RAG application ^[1]^[2]^[3].

To continue talking to Dosu, mention @dosu.

Hi, I just realized that the issue might be the chunking thing. If the loaded documents are already perfect size to be fitted into the embedding model. How can I skip the transformation procedure? Like I don't need to split the text or merge with other documents. I just need to feed every single document into the embedding model and store them into the chroma database

To skip the chunking process and directly embed and store each document as-is into the Chroma database, you can modify the add method in the ChromaVectorStore class to process the list of nodes directly without chunking. Here is the modified add method:

def add(self, nodes: List[BaseNode], **add_kwargs: Any) -> List[str]:
    """Add nodes to index.

    Args:
        nodes: List[BaseNode]: list of nodes with embeddings

    """
    if not self._collection:
        raise ValueError("Collection not initialized")

    embeddings = []
    metadatas = []
    ids = []
    documents = []
    for node in nodes:
        embeddings.append(node.get_embedding())
        metadata_dict = node_to_metadata_dict(
            node, remove_text=True, flat_metadata=self.flat_metadata
        )
        for key in metadata_dict:
            if metadata_dict[key] is None:
                metadata_dict[key] = ""
        metadatas.append(metadata_dict)
        ids.append(node.node_id)
        documents.append(node.get_content(metadata_mode=MetadataMode.NONE))

    self._collection.add(
        embeddings=embeddings,
        ids=ids,
        metadatas=metadatas,
        documents=documents,
    )

    return ids

This modification removes the chunking logic and processes the list of nodes directly, embedding and storing each document as-is into the Chroma database.

Additionally, you can refer to the test_add_to_chromadb_and_query function in the llama-index-legacy/tests/vector_stores/test_chromadb.py file for an example of how to add nodes to the Chroma database and query them. This test function demonstrates both synchronous and asynchronous methods for adding nodes and querying the database, which might be useful for your implementation ^[1]^[2].

To continue talking to Dosu, mention @dosu.

Can't I just set the transformation as a empty list?

Yes, you can set the transformation as an empty list to ensure that the documents are directly embedded and stored in Chroma DB without any chunking. The add method in the ChromaVectorStore class handles chunking internally using the chunk_list function, which is based on the MAX_CHUNK_SIZE. If you want to avoid chunking, you can modify the MAX_CHUNK_SIZE to a very large number or ensure that the list of nodes you pass is smaller than the MAX_CHUNK_SIZE.

Here is the relevant part of the code that handles chunking:

MAX_CHUNK_SIZE = 41665  # One less than the max chunk size for ChromaDB

def chunk_list(
    lst: List[BaseNode], max_chunk_size: int
) -> Generator[List[BaseNode], None, None]:
    """Yield successive max_chunk_size-sized chunks from lst.

    Args:
        lst (List[BaseNode]): list of nodes with embeddings
        max_chunk_size (int): max chunk size

    Yields:
        Generator[List[BaseNode], None, None]: list of nodes with embeddings
    """
    for i in range(0, len(lst), max_chunk_size):
        yield lst[i : i + max_chunk_size]

To avoid chunking, you can set MAX_CHUNK_SIZE to a very large number:

MAX_CHUNK_SIZE = float('inf')  # Set to infinity to avoid chunking

Or ensure that the list of nodes you pass to the add method is smaller than the MAX_CHUNK_SIZE ^[1].

To continue talking to Dosu, mention @dosu.

So, I eventually fix this issue. The issue is occurred due to the auto chunking by default. My documents should be divided based on row instead of number of tokens. The default chunking size is 1024 tokens so the transformation merged several documents into one. This introduces a massive amount of noise into the index. Just set the chunk size as infinity will solve the issue if your documents are from SQL database or other relational database

run-llama / llama_index