Closed kilimchoi closed 10 months ago
🤖
Hello @kilimchoi! :wave:
I'm Dosu, a friendly bot here to help you with your questions and issues related to LlamaIndex while we wait for a human maintainer. I'm here to guide you, help solve bugs, and assist you in becoming an even better contributor! Feel free to ask me anything you need. Let's make your LlamaIndex experience as smooth as possible together. :smiley:
In the LlamaIndex framework, it is generally recommended to save both the document text and its unique summary for later use in initializing a graph. This is based on the functionality of the DocumentSummaryIndex
class, which generates a unique summary for each document and stores it in the summary_node_dict
.
The unique summary for each document is generated by the synthesize
method of the _response_synthesizer
object. This method takes a query and a list of nodes (representing documents) as input and generates a response based on the content of the nodes. The generated response is then used as the summary for the document.
Here is the relevant code snippet from the DocumentSummaryIndex
class:
for doc_id, nodes in iterable_with_progress:
nodes_with_scores = [NodeWithScore(node=n) for n in nodes]
# get the summary for each doc_id
summary_response = self._response_synthesizer.synthesize(
query=self._summary_query,
nodes=nodes_with_scores,
)
summary_response = cast(Response, summary_response)
summary_node_dict[doc_id] = TextNode(
text=summary_response.response,
relationships={
NodeRelationship.SOURCE: RelatedNodeInfo(node_id=doc_id)
},
)
self.docstore.add_documents([summary_node_dict[doc_id]])
logger.info(
f"> Generated summary for doc {doc_id}: " f"{summary_response.response}"
)
(Source: llama_index/indices/document_summary/base.py)
This indicates that each document has its own unique summary, and the same summary is not used for each document regardless of the document's content. Therefore, it would be beneficial to save both the document text and its summary for later use in initializing a graph.
I hope this answers your question. If you have any further queries, feel free to ask.
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.
Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.
If you want to continue the conversation, start your reply with @dosu-bot.
Also what's the recommended approach to perform complex queries over multiple documents? noticed that it's pretty slow if i use the composable graph.
The composable graph has not been updated in a hot minute (and will likely be removed at a later date)
Current approaches for multi-doc revolve around a few things, mostly they boil down to providing methods to route queries to the proper locations
https://docs.llamaindex.ai/en/stable/module_guides/querying/router/root.html https://docs.llamaindex.ai/en/stable/examples/query_engine/sub_question_query_engine.html https://docs.llamaindex.ai/en/stable/examples/agent/multi_document_agents.html https://docs.llamaindex.ai/en/stable/examples/retrievers/recursive_retriever_nodes.html
The composable graph has not been updated in a hot minute (and will likely be removed at a later date)
Current approaches for multi-doc revolve around a few things, mostly they boil down to providing methods to route queries to the proper locations
https://docs.llamaindex.ai/en/stable/module_guides/querying/router/root.html https://docs.llamaindex.ai/en/stable/examples/query_engine/sub_question_query_engine.html https://docs.llamaindex.ai/en/stable/examples/agent/multi_document_agents.html https://docs.llamaindex.ai/en/stable/examples/retrievers/recursive_retriever_nodes.html
Thanks
Question Validation
Question
I've been reading this https://docs.llamaindex.ai/en/stable/examples/usecases/City_Analysis-Decompose-KeywordTable.html and I've been trying to do something similar by using the index saved in the vector db. Is it generally recommended to save the document text as well as its summary in the metadata if we want to later fetch it to initialize a graph? Or is it better to use the same summary for each document regardless of the document's content? Is there a resource I can take a look to load the documents & summaries from the vector db?