run-llama / llama_index

LlamaIndex is a data framework for your LLM applications
https://docs.llamaindex.ai
MIT License
35.73k stars 5.05k forks source link

[Question]: can i save a summary index? #14928

Open whoo9112 opened 2 months ago

whoo9112 commented 2 months ago

Question Validation

Question

I used Multi docs agent in llamaindex.

Multi docs agent is save the vector index. but, not save summary index. Because of this, I have to recreate the nodes for the summary index every time. Is there a way to save the summary index?

dosubot[bot] commented 2 months ago

Yes, there is a way to save the summary index when using the Multi Docs Agent in LlamaIndex. The summary index is built and saved using the following code snippet:

# build summary index
summary_index = DocumentSummaryIndex(nodes)

# define query engines
vector_query_engine = vector_index.as_query_engine(llm=llm)
summary_query_engine = summary_index.as_query_engine(
    response_mode="tree_summarize", llm=llm
)

# extract a summary
if not os.path.exists(summary_out_path):
    Path(summary_out_path).parent.mkdir(parents=True, exist_ok=True)
    summary = str(
        await summary_query_engine.aquery(
            "Extract a concise 1-2 line summary of this document"
        )
    )
    pickle.dump(summary, open(summary_out_path, "wb"))
else:
    summary = pickle.load(open(summary_out_path, "rb"))

This code builds the summary index using DocumentSummaryIndex and saves it to a file using pickle.dump. If the summary index file already exists, it loads the summary from the file using pickle.load [1][2][3].

To continue talking to Dosu, mention @dosu.

whoo9112 commented 2 months ago

@dosu no DocumentSummaryIndex, i said SummaryIndex

dosubot[bot] commented 2 months ago

To save the summary index using SummaryIndex in LlamaIndex and avoid recreating the nodes every time, you can use the save_to_storage and load_from_storage methods. Here is an example:

Saving the Summary Index

from llama_index import StorageContext, SummaryIndex

# Assuming `summary_index` is your SummaryIndex object
storage_context = StorageContext.from_defaults()
summary_index.save_to_storage(storage_context)
storage_context.persist(persist_dir="./storage")

Loading the Summary Index

from llama_index import StorageContext, SummaryIndex

# Load the storage context
storage_context = StorageContext.from_defaults(persist_dir="./storage")

# Load the SummaryIndex from storage
summary_index = SummaryIndex.load_from_storage(storage_context, index_id="your_index_id")

This approach allows you to save the SummaryIndex to a storage backend and retrieve it later, avoiding the need to recreate the nodes every time [1][2].

To continue talking to Dosu, mention @dosu.