run-llama / llama_index

LlamaIndex is a data framework for your LLM applications
https://docs.llamaindex.ai
MIT License
36.15k stars 5.16k forks source link

[Question]: How can I use SimpleDirectoryReader and StorageContext #16343

Open atodhruv opened 2 weeks ago

atodhruv commented 2 weeks ago

Question Validation

Question

I want to use SimpleDirectoryReader and StorageContext in way First function generate vector storage Then another one just use that vector by init function

dosubot[bot] commented 2 weeks ago

To use SimpleDirectoryReader and StorageContext to first generate vector storage and then use that vector with an init function, follow these steps:

  1. Load Documents and Build Index: Use SimpleDirectoryReader to load your documents and VectorStoreIndex to build an index from them.

    from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
    
    # Load documents
    documents = SimpleDirectoryReader("../../examples/data/paul_graham").load_data()
    # Build index
    index = VectorStoreIndex.from_documents(documents)
  2. Specify a Vector Store with StorageContext: Use StorageContext to specify the vector store you want to use. For example, using Azure Table Storage:

    from llama_index.storage.index_store.azure import AzureIndexStore
    from llama_index.storage.kvstore.azure.base import ServiceMode
    from llama_index.core import StorageContext
    
    # Initialize AzureIndexStore
    azure_store = AzureIndexStore.from_account_and_key(
       account_name="your_account_name",
       account_key="your_account_key",
       service_mode=ServiceMode.STORAGE
    )
    
    # Construct storage context with AzureIndexStore
    storage_context = StorageContext.from_defaults(index_store=azure_store)
    
    # Load documents and build index with storage context
    index = VectorStoreIndex.from_documents(documents, storage_context=storage_context)
  3. Persist and Load Index: You can persist the index and load it later using the storage context:

    # Save index
    index.storage_context.persist(persist_dir="<persist_dir>")
    
    # Load index later
    storage_context = StorageContext.from_defaults(persist_dir="<persist_dir>")
    from llama_index.core import load_index_from_storage
    loaded_index = load_index_from_storage(storage_context)

This process involves loading documents, building an index, and specifying a storage context to manage how and where the vectors are stored [1].

To continue talking to Dosu, mention @dosu.