Open Neethan54 opened 2 months ago
To filter by a particular document name and perform Q&A only on that document in LlamaIndex using the retriever, you can modify the retrieval process to include a filter based on the document name. Here is how you can do it:
Here is the modified code to achieve this:
# Assuming you have already loaded your documents and created nodes as shown in the provided content
# Filter nodes by document name
document_name = "llama2.pdf" # Replace with your document name
filtered_nodes = [node for node in nodes if node.metadata.get('file_name') == document_name]
# Generate embeddings for the filtered nodes
for node in filtered_nodes:
node_embedding = embed_model.get_text_embedding(
node.get_content(metadata_mode="all")
)
node.embedding = node_embedding
# Load filtered nodes into the vector store
vector_store.add(filtered_nodes)
# Define the retriever to use the filtered vector store
retriever = VectorDBRetriever(
vector_store, embed_model, query_mode="default", similarity_top_k=2
)
# Plug this into our RetrieverQueryEngine to synthesize a response
query_engine = RetrieverQueryEngine.from_args(retriever, llm=llm)
# Perform the query
query_str = "How does Llama 2 perform compared to other open-source models?"
response = query_engine.query(query_str)
print(str(response))
print(response.source_nodes[0].get_content())
In this code:
document_name
is the name of the document you want to filter by.filtered_nodes
contains only the nodes from the specified document.Additionally, you can use the concept of "document agents" for more advanced decision-making over heterogeneous documents. This involves creating a separate document agent for each document, which can dynamically choose to perform semantic search or summarization within a given document. Here is an example of how to set up document agents:
from llama_index.agent.openai import OpenAIAgent
# Build agents dictionary
agents = {}
# Assuming city_docs is a dictionary with document names as keys and document contents as values
for doc_name, doc_content in city_docs.items():
# build vector index
vector_index = VectorStoreIndex.from_documents(doc_content)
# build summary index
summary_index = SummaryIndex.from_documents(doc_content)
# define query engines
vector_query_engine = vector_index.as_query_engine()
list_query_engine = summary_index.as_query_engine()
# define tools
query_engine_tools = [
QueryEngineTool(
query_engine=vector_query_engine,
metadata=ToolMetadata(
name="vector_tool",
description=f"Useful for retrieving specific context from {doc_name}",
),
),
QueryEngineTool(
query_engine=list_query_engine,
metadata=ToolMetadata(
name="summary_tool",
description=f"Useful for summarization questions related to {doc_name}",
),
),
]
# build agent
function_llm = OpenAI(model="gpt-3.5-turbo-0613")
agent = OpenAIAgent.from_tools(
query_engine_tools,
llm=function_llm,
verbose=True,
)
agents[doc_name] = agent
This setup allows you to dynamically choose the appropriate tool (semantic search or summarization) for each document during the query process.
For a more structured approach, you can use the MultiDocAutoRetrieverPack
from LlamaIndex, which allows for hierarchical retrieval across multiple documents. Here is an example of how to set it up:
from llama_index.core.llama_pack import download_llama_pack
# Download the MultiDocAutoRetrieverPack
MultiDocAutoRetrieverPack = download_llama_pack(
"MultiDocAutoRetrieverPack", "./multidoc_autoretriever_pack"
)
# Setup the pack with your documents and metadata
pack = MultiDocAutoRetrieverPack(
client,
doc_metadata_index_name,
doc_chunks_index_name,
new_docs,
docs,
vector_store_info,
auto_retriever_kwargs={
"verbose": True,
"similarity_top_k": 2,
"empty_query_top_k": 10,
},
verbose=True,
)
# Run the pack on a query
response = pack.run("Tell me about some issues on 12/11")
print(str(response))
# Retrieve nodes using the retriever module
retriever = pack.get_modules()["recursive_retriever"]
nodes = retriever.retrieve("Tell me about some open issues related to agents")
print(f"Number of source nodes: {len(nodes)}")
print(nodes[0].node.metadata)
This approach leverages the MultiDocAutoRetrieverPack
for more efficient and structured retrieval across multiple documents [1][2][3][4].
To continue talking to Dosu, mention @dosu.
Question Validation
Question
Hello all,
How can we do filtering by a particular document name and do Q&A only on this document in LLamaIndex. Im using below code as Retiever .
Please let me know , is there any option In LlamaIndex for filtering retriever = index.as_retriever()