run-llama / llama_index

LlamaIndex is a data framework for your LLM applications
https://docs.llamaindex.ai
MIT License
37.01k stars 5.31k forks source link

[Question]: What is the core difference and ideology between the abstractions "VectorStoreIndex", "StorageContext", "VectorStore". #15475

Closed falakic closed 1 day ago

falakic commented 3 months ago

Question Validation

Question

I see index is created using a storagecontext which usually is created using nothing but a vectorstore. Also, there is usually a client or collection used to create a vectorstore. I'm confused by the number of abstractions and would like to know the distinctions between all.

dosubot[bot] commented 3 months ago

Hey @falakic! I'm here to help you with any questions or issues you have. Feel free to ask anything, and I'll do my best to assist you while we wait for a human maintainer.

The core difference and ideology between the abstractions "VectorStoreIndex", "StorageContext", and "VectorStore" in LlamaIndex are as follows:

  1. VectorStoreIndex:

    • Ideology: Manages an index of embedding vectors, providing a high-level interface for building and querying the index.
    • Usage: Used to build a new index from documents or reload an existing index from a vector store. Supports operations like insertion, deletion, update, and refresh. Supports various vector stores as storage backends, such as Pinecone, DeepLake, etc.
    • Example:

      from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, StorageContext
      from llama_index.vector_stores.pinecone import PineconeVectorStore
      import pinecone
      
      # init pinecone
      pinecone.init(api_key="<api_key>", environment="<environment>")
      pinecone.create_index("quickstart", dimension=1536, metric="euclidean", pod_type="p1")
      
      # construct vector store and customize storage context
      storage_context = StorageContext.from_defaults(
       vector_store=PineconeVectorStore(pinecone.Index("quickstart"))
      )
      
      # Load documents and build index
      documents = SimpleDirectoryReader("../../examples/data/paul_graham").load_data()
      index = VectorStoreIndex.from_documents(documents, storage_context=storage_context)
  2. StorageContext:

    • Ideology: Defines and manages the storage context, including various storage components like document stores, index stores, vector stores, and graph stores.
    • Usage: Customizes and manages the storage components required for indexing and querying. Provides methods to persist the storage context to disk to avoid re-indexing.
    • Example:

      from llama_index.core.storage.docstore import SimpleDocumentStore
      from llama_index.core.storage.index_store import SimpleIndexStore
      from llama_index.core.vector_stores import SimpleVectorStore
      from llama_index.core.graph_stores.simple import SimpleGraphStore
      from llama_index.core import StorageContext
      
      # create storage context using default stores
      storage_context = StorageContext.from_defaults(
       docstore=SimpleDocumentStore(),
       vector_store=SimpleVectorStore(),
       index_store=SimpleIndexStore(),
       graph_store=SimpleGraphStore()
      )
      
      # persist storage context to disk
      storage_context.persist(persist_dir="<persist_dir>")
  3. VectorStore:

    • Ideology: Stores embedding vectors, focusing on managing the storage and retrieval of vectors. Supports a wide range of vector stores, including Alibaba Cloud OpenSearch, Amazon Neptune, Apache Cassandra, Azure AI Search, and many others.
    • Usage: Stores embedding vectors and can be customized based on the storage backend. Can also load previously stored vectors to avoid re-indexing.
    • Example:

      from llama_index.vector_stores.deeplake import DeepLakeVectorStore
      
      # construct vector store
      vector_store = DeepLakeVectorStore(dataset_path="<dataset_path>")

In summary, "VectorStoreIndex" is for managing the index of vectors, "StorageContext" is for managing the overall storage components and persisting them, and "VectorStore" is specifically for storing embedding vectors [1][2][3][4][5].

To continue talking to Dosu, mention @dosu.

logan-markewich commented 3 months ago

A vector store is just a class that wraps any vector store (chroma, qdrant, etc.)

The storage context bundles the vector store, plus other storage layers (docstore, etc.)

The VectorStoreIndex handles chunking, insertion, and shortcuts to creating chat engines and query engines, It works on top of a storage context, and handles all interactions with the storage layer.

You could totally skip all of these and use the vector store directly if you want, its kind of up to you

dosubot[bot] commented 1 week ago

Hi, @falakic. I'm Dosu, and I'm helping the LlamaIndex team manage their backlog and am marking this issue as stale.

Issue Summary:

Next Steps:

Thank you for your understanding and contribution!