Closed avalonliberty closed 5 months ago
To improve the overall performance of your RAG application, especially when the retriever fails to fetch the document with the exact company name, you can consider the following techniques:
Decoupling Chunks Used for Retrieval vs. Chunks Used for Synthesis:
from llama_index import DocumentSummaryIndex
# Create a document summary index
doc_summary_index = DocumentSummaryIndex(documents)
# Retrieve relevant documents using the summary index
relevant_docs = doc_summary_index.retrieve(query)
Structured Retrieval for Larger Document Sets:
from llama_index import MetadataFilter
# Tag documents with metadata
metadata_filter = MetadataFilter(documents, tags)
# Use metadata filter to retrieve documents
filtered_docs = metadata_filter.retrieve(query)
Dynamically Retrieve Chunks Depending on your Task:
from llama_index import RouterQueryEngine
# Initialize the router query engine
router_engine = RouterQueryEngine()
# Perform task-specific retrieval
result = router_engine.query(query, task_type="summarization")
Optimize Context Embeddings:
from llama_index import EmbeddingFineTuner
# Fine-tune the embedding model
fine_tuner = EmbeddingFineTuner()
fine_tuned_model = fine_tuner.fine_tune(corpus)
Implement Corrective RAG Techniques:
from llama_index.core.prompts import PromptTemplate
from llama_index.core.query_pipeline.query import QueryPipeline
from llama_index.llms.openai import OpenAI
DEFAULT_RELEVANCY_PROMPT_TEMPLATE = PromptTemplate(
template="""As a grader, your task is to evaluate the relevance of a document retrieved in response to a user's question.
Retrieved Document:
-------------------
{context_str}
User Question:
--------------
{query_str}
Evaluation Criteria:
- Consider whether the document contains keywords or topics related to the user's question.
- The evaluation should not be overly stringent; the primary objective is to identify and filter out clearly irrelevant retrievals.
Decision:
- Assign a binary score to indicate the document's relevance.
- Use 'yes' if the document is relevant to the question, or 'no' if it is not.
Please provide your binary score ('yes' or 'no') below to indicate the document's relevance to the user question."""
)
DEFAULT_TRANSFORM_QUERY_TEMPLATE = PromptTemplate(
template="""Your task is to refine a query to ensure it is highly effective for retrieving relevant search results. \n
Analyze the given input to grasp the core semantic intent or meaning. \n
Original Query:
\n ------- \n
{query_str}
\n ------- \n
Your goal is to rephrase or enhance this query to improve its search performance. Ensure the revised query is concise and directly aligned with the intended search objective. \n
Respond with the optimized query only:"""
)
class CorrectiveRAGPack(BaseLlamaPack):
def __init__(self, documents: List[Document], tavily_ai_apikey: str) -> None:
llm = OpenAI(model="gpt-4")
self.relevancy_pipeline = QueryPipeline(
chain=[DEFAULT_RELEVANCY_PROMPT_TEMPLATE, llm]
)
self.transform_query_pipeline = QueryPipeline(
chain=[DEFAULT_TRANSFORM_QUERY_TEMPLATE, llm]
)
self.llm = llm
self.index = VectorStoreIndex.from_documents(documents)
self.tavily_tool = TavilyToolSpec(api_key=tavily_ai_apikey)
def retrieve_nodes(self, query_str: str, **kwargs: Any) -> List[NodeWithScore]:
retriever = self.index.as_retriever(**kwargs)
return retriever.retrieve(query_str)
def evaluate_relevancy(self, retrieved_nodes: List[Document], query_str: str) -> List[str]:
relevancy_results = []
for node in retrieved_nodes:
relevancy = self.relevancy_pipeline.run(
context_str=node.text, query_str=query_str
)
relevancy_results.append(relevancy.message.content.lower().strip())
return relevancy_results
def extract_relevant_texts(self, retrieved_nodes: List[NodeWithScore], relevancy_results: List[str]) -> str:
relevant_texts = [
retrieved_nodes[i].text
for i, result in enumerate(relevancy_results)
if result == "yes"
]
return "\n".join(relevant_texts)
def search_with_transformed_query(self, query_str: str) -> str:
search_results = self.tavily_tool.search(query_str, max_results=5)
return "\n".join([result.text for result in search_results])
def get_result(self, relevant_text: str, search_text: str, query_str: str) -> Any:
documents = [Document(text=relevant_text + "\n" + search_text)]
index = SummaryIndex.from_documents(documents)
query_engine = index.as_query_engine()
return query_engine.query(query_str)
def run(self, query_str: str, **kwargs: Any) -> Any:
retrieved_nodes = self.retrieve_nodes(query_str, **kwargs)
relevancy_results = self.evaluate_relevancy(retrieved_nodes, query_str)
relevant_text = self.extract_relevant_texts(retrieved_nodes, relevancy_results)
search_text = ""
if "no" in relevancy_results:
transformed_query_str = self.transform_query_pipeline.run(
query_str=query_str
).message.content
search_text = self.search_with_transformed_query(transformed_query_str)
if search_text:
return self.get_result(relevant_text, search_text, query_str)
else:
return self.get_result(relevant_text, "", query_str)
Utilize RAGatouille Retriever Pack:
from llama_index.packs.ragatouille_retriever.base import RAGatouilleRetrieverPack
# Initialize the RAGatouille Retriever Pack
retriever_pack = RAGatouilleRetrieverPack(documents)
# Add documents to the retriever
retriever_pack.add_documents(new_documents)
# Run the retriever with a query
result = retriever_pack.run(query)
By implementing these techniques, you can enhance the retrieval accuracy and overall performance of your RAG application [1][2][3].
To continue talking to Dosu, mention @dosu.
May you explain the first suggestion like I am 5?
Sure! Let's break it down into very simple terms:
Decoupling Chunks for Retrieval and Synthesis:
Embedding a Document Summary:
In your code, you can create a summary index to help with this:
from llama_index import DocumentSummaryIndex
# Create a document summary index
doc_summary_index = DocumentSummaryIndex(documents)
# Retrieve relevant documents using the summary index
relevant_docs = doc_summary_index.retrieve(query)
This way, you first find the right documents using the summaries, and then dive into the details only when needed. This makes finding the right information faster and easier!
To continue talking to Dosu, mention @dosu.
According to your message, can I understand that the whole thing is we embedding the summary of documents instead of documents itself. While we try to generate a response, we fetch the context based on the summary still but generate the context based on the full documents. If this is the case, how can I rebuild the DocumentSummaryIndex from the chroma db?
Yes, you are correct. The idea is to embed summaries of documents for efficient retrieval and then use the full documents for generating detailed responses. To rebuild the DocumentSummaryIndex
from Chroma DB, you can follow these steps:
DocumentSummaryIndex
using the retrieved summaries and metadata.Here is an example of how you can achieve this:
First, ensure you store the summaries and their metadata in Chroma DB:
from llama_index.vector_stores.chroma import ChromaVectorStore
from llama_index.core import StorageContext
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
import chromadb
# Assuming 'summaries' is a list of summary documents
summaries = [...] # Your summary documents here
# Initialize ChromaDB client and collection
chroma_client = chromadb.HttpClient(host="your_chromadb_host", port="your_chromadb_port", ssl=False)
chroma_collection = chroma_client.get_or_create_collection("summary_collection")
# Define your embedding model
embed_model = HuggingFaceEmbedding(model_name="your_model_name_here")
# Initialize ChromaVectorStore with the collection
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
# Set up StorageContext with the initialized vector store
storage_context = StorageContext.from_defaults(vector_store=vector_store)
# Store summaries in ChromaDB
for summary in summaries:
vector_store.add_document(summary)
DocumentSummaryIndex
from Chroma DBNext, retrieve the summaries and metadata from Chroma DB and rebuild the DocumentSummaryIndex
:
from llama_index.legacy.indices.document_summary.base import DocumentSummaryIndex
from llama_index.core.schema import BaseNode
# Retrieve summaries from ChromaDB
retrieved_summaries = vector_store.get_all_documents()
# Create an instance of DocumentSummaryIndex
document_summary_index = DocumentSummaryIndex()
# Add retrieved summaries to the index
for summary in retrieved_summaries:
summary_node = BaseNode(node_id=summary['id'], ref_doc_id=summary['ref_doc_id'])
nodes = [BaseNode(node_id=node_id) for node_id in summary['node_ids']]
document_summary_index.add_summary_and_nodes(summary_node, nodes)
# Now you can use the document_summary_index for retrieval
query = "Do you know the company XXX?"
relevant_docs = document_summary_index.retrieve(query)
This approach ensures that you can efficiently store, retrieve, and use document summaries for retrieval while maintaining the ability to generate detailed responses based on the full documents. Make sure to replace placeholders like "your_chromadb_host"
, "your_chromadb_port"
, and "your_model_name_here"
with your actual configurations.
To continue talking to Dosu, mention @dosu.
Great! I see the whole process of it. However, I don't see the point that how decoupling the retrieval and synthesis can help me to retrieve the right documents from the database. If the retriever has already failed to retrieved the right document given the full document list, will it be easier to retrieve the right documents based on only the summary of documents? May you elaborate the logic behind it?
Decoupling retrieval and synthesis by using document summaries can indeed help improve the accuracy of retrieving the right documents. Here's the logic behind it:
Focused Retrieval:
Efficiency:
Improved Relevance:
Imagine you have a database of company profiles, each with detailed information. If you query "Do you know the company XXX?", the retriever might struggle to find the exact match due to the volume of information in each profile. However, if each profile has a summary that highlights the company name, location, and key details, the retriever can more easily match the query to the correct summary.
Create Summaries:
Store Summaries in Chroma DB:
Retrieve Using Summaries:
Detailed Synthesis:
By following this approach, you can improve the accuracy and efficiency of your retrieval process, ensuring that the right documents are retrieved and used for generating responses.
For more details on how to implement this, you can refer to the previous examples provided on storing and retrieving summaries using Chroma DB and rebuilding the DocumentSummaryIndex
[1][2][3][4].
To continue talking to Dosu, mention @dosu.
Okay. It seems that the first suggestion does not work. May you elaborate your second suggestion in detail?
Sure! Let's dive into the second suggestion: Structured Retrieval for Larger Document Sets. This involves using metadata filters and auto retrieval, as well as storing document hierarchies (summaries -> raw chunks) and performing recursive retrieval. Here's a detailed breakdown:
Metadata Filters:
Auto Retrieval:
Tag Documents with Metadata:
Initialize Vector Stores for Metadata and Document Chunks:
Setup Auto Retriever:
import weaviate
from llama_index import VectorStoreIndex, StorageContext, MetadataFilter, VectorIndexAutoRetriever
from llama_index.core.schema import Document, BaseNode
from llama_index.vector_stores.weaviate import WeaviateVectorStore
# Initialize Weaviate client
client = weaviate.Client("http://localhost:8080")
# Define documents and metadata
documents = [...] # List of Document objects
metadata_nodes = [...] # List of BaseNode objects with metadata
# Initialize vector store for metadata
metadata_store = WeaviateVectorStore(weaviate_client=client, index_name="metadata_index")
metadata_sc = StorageContext.from_defaults(vector_store=metadata_store)
metadata_index = VectorStoreIndex(metadata_nodes, storage_context=metadata_sc)
# Initialize vector store for document chunks
chunks_store = WeaviateVectorStore(weaviate_client=client, index_name="chunks_index")
chunks_sc = StorageContext.from_defaults(vector_store=chunks_store)
doc_index = VectorStoreIndex.from_documents(documents, storage_context=chunks_sc)
# Setup auto retriever
auto_retriever = VectorIndexAutoRetriever(
index=metadata_index,
vector_store_info={"metadata": "info"},
similarity_top_k=10
)
# Define per-document retrievers
retriever_dict = {}
for doc in documents:
index_id = doc.metadata["index_id"]
filters = MetadataFilter(filters=[{"key": "index_id", "value": index_id}])
retriever = doc_index.as_retriever(filters=filters)
retriever_dict[index_id] = retriever
# Setup recursive retriever
recursive_retriever = RecursiveRetriever(
"vector",
retriever_dict={"vector": auto_retriever, **retriever_dict},
verbose=True
)
# Query engine
query_engine = RetrieverQueryEngine.from_args(recursive_retriever, llm=OpenAI(model="gpt-3.5-turbo"))
# Run query
query = "Do you know the company XXX?"
result = query_engine.query(query)
Document Hierarchies:
Recursive Retrieval:
from llama_index.core.retrievers import RecursiveRetriever
from llama_index.core.query_engine import RetrieverQueryEngine
from llama_index.llms.openai import OpenAI
# Define Local Index State
class LocalIndexState:
def __init__(self):
self.full_doc_summaries_by_id = {}
self.chunks_by_id = {}
self.retrieval_tool_function_name = "search_documents"
self.retrieval_tool_description = "Searches for and returns document chunks."
# Create document summaries and chunks
summaries = [...] # List of summary documents
chunks = [...] # List of document chunks
# Store summaries and chunks in the state
state = LocalIndexState()
for summary in summaries:
state.full_doc_summaries_by_id[summary.id] = summary
for chunk in chunks:
state.chunks_by_id[chunk.id] = chunk
# Setup Fused Summary Retriever
class FusedSummaryRetriever(BaseRetriever):
def __init__(self, vectorstore, full_doc_summary_store, parent_doc_store):
self.vectorstore = vectorstore
self.full_doc_summary_store = full_doc_summary_store
self.parent_doc_store = parent_doc_store
def _retrieve(self, query_bundle):
# Implement retrieval logic
pass
# Initialize retriever
retriever = FusedSummaryRetriever(
vectorstore=chunks_store,
full_doc_summary_store=state.full_doc_summaries_by_id,
parent_doc_store=state.chunks_by_id
)
# Setup query engine
query_engine = RetrieverQueryEngine(retriever=retriever)
# Run query
query = "Do you know the company XXX?"
result = query_engine.query(query)
By following these steps, you can implement structured retrieval using metadata filters and auto retrieval, store document hierarchies, and perform recursive retrieval to enhance the performance of your RAG application [1][2][3].
To continue talking to Dosu, mention @dosu.
Hi, I just realized that the issue might be the chunking thing. If the loaded documents are already perfect size to be fitted into the embedding model. How can I skip the transformation procedure? Like I don't need to split the text or merge with other documents. I just need to feed every single document into the embedding model and store them into the chroma database
To skip the chunking process and directly embed and store each document as-is into the Chroma database, you can modify the add
method in the ChromaVectorStore
class to process the list of nodes directly without chunking. Here is the modified add
method:
def add(self, nodes: List[BaseNode], **add_kwargs: Any) -> List[str]:
"""Add nodes to index.
Args:
nodes: List[BaseNode]: list of nodes with embeddings
"""
if not self._collection:
raise ValueError("Collection not initialized")
embeddings = []
metadatas = []
ids = []
documents = []
for node in nodes:
embeddings.append(node.get_embedding())
metadata_dict = node_to_metadata_dict(
node, remove_text=True, flat_metadata=self.flat_metadata
)
for key in metadata_dict:
if metadata_dict[key] is None:
metadata_dict[key] = ""
metadatas.append(metadata_dict)
ids.append(node.node_id)
documents.append(node.get_content(metadata_mode=MetadataMode.NONE))
self._collection.add(
embeddings=embeddings,
ids=ids,
metadatas=metadatas,
documents=documents,
)
return ids
This modification removes the chunking logic and processes the list of nodes directly, embedding and storing each document as-is into the Chroma database.
Additionally, you can refer to the test_add_to_chromadb_and_query
function in the llama-index-legacy/tests/vector_stores/test_chromadb.py
file for an example of how to add nodes to the Chroma database and query them. This test function demonstrates both synchronous and asynchronous methods for adding nodes and querying the database, which might be useful for your implementation [1][2].
To continue talking to Dosu, mention @dosu.
Can't I just set the transformation as a empty list?
Yes, you can set the transformation as an empty list to ensure that the documents are directly embedded and stored in Chroma DB without any chunking. The add
method in the ChromaVectorStore
class handles chunking internally using the chunk_list
function, which is based on the MAX_CHUNK_SIZE
. If you want to avoid chunking, you can modify the MAX_CHUNK_SIZE
to a very large number or ensure that the list of nodes you pass is smaller than the MAX_CHUNK_SIZE
.
Here is the relevant part of the code that handles chunking:
MAX_CHUNK_SIZE = 41665 # One less than the max chunk size for ChromaDB
def chunk_list(
lst: List[BaseNode], max_chunk_size: int
) -> Generator[List[BaseNode], None, None]:
"""Yield successive max_chunk_size-sized chunks from lst.
Args:
lst (List[BaseNode]): list of nodes with embeddings
max_chunk_size (int): max chunk size
Yields:
Generator[List[BaseNode], None, None]: list of nodes with embeddings
"""
for i in range(0, len(lst), max_chunk_size):
yield lst[i : i + max_chunk_size]
To avoid chunking, you can set MAX_CHUNK_SIZE
to a very large number:
MAX_CHUNK_SIZE = float('inf') # Set to infinity to avoid chunking
Or ensure that the list of nodes you pass to the add
method is smaller than the MAX_CHUNK_SIZE
[1].
To continue talking to Dosu, mention @dosu.
So, I eventually fix this issue. The issue is occurred due to the auto chunking by default. My documents should be divided based on row instead of number of tokens. The default chunking size is 1024 tokens so the transformation merged several documents into one. This introduces a massive amount of noise into the index. Just set the chunk size as infinity will solve the issue if your documents are from SQL database or other relational database
Question Validation
Question
Hi, All. I loaded an stored the table into chromadb. The table contains four columns ["company_name", "country", "city", "description"]. After that, I query the llm with the query "Do you know the company XXX ?". Though, the retriever failed to fetch the document with exact the same company name. I did use the reranking service to improve the quality but it did not work given that the retriever had failed to retrieve relevant documents in the returned documents. I want to know how I can improve the overall performance of the RAG application.
Here is the code I used to do the whole thing. Some of them are abstract. Please let me know if you need more detail to solve this this issue.