run-llama / llama_index

LlamaIndex is a data framework for your LLM applications
https://docs.llamaindex.ai
MIT License
36.01k stars 5.12k forks source link

[Question]: how can i limit the VectorStoreIndex's from_documents resource usage? #15018

Closed atlas-comstock closed 2 months ago

atlas-comstock commented 2 months ago

Question Validation

Question

[Question]: how can i limit the VectorStoreIndex's from_documents resource usage?

dosubot[bot] commented 2 months ago

Hi @atlas-comstock! I'm a bot here to help you with any questions or issues you have. If you need assistance or have any questions, feel free to ask!

To limit the VectorStoreIndex's from_documents resource usage, you can adjust the batch size for vector generation and insertion. By default, the VectorStoreIndex generates and inserts vectors in batches of 2048 nodes. If you are memory constrained, you can reduce this batch size by passing the insert_batch_size parameter with your desired batch size.

Here is an example:

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader

# Load documents
documents = SimpleDirectoryReader("../../examples/data/paul_graham").load_data()

# Build index with a smaller batch size to limit resource usage
index = VectorStoreIndex.from_documents(documents, insert_batch_size=512)

Additionally, if you are using the Vectara backend, you can leverage the parallelize_ingest parameter to control whether the ingestion process should be parallelized. This can help manage resource usage more effectively.

Here is an example for Vectara:

from llama_index.legacy.indices.managed.vectara.base import VectaraIndex
from llama_index.legacy.schema import Document

# Load documents
documents = [Document(text="Sample text", metadata={"author": "Author"})]

# Build Vectara index with parallel ingestion to limit resource usage
index = VectaraIndex.from_documents(documents, parallelize_ingest=True)

For more granular control over resource usage, you can use the low-level API to customize the storage context. This allows you to manage where ingested documents, embedding vectors, and index metadata are stored, which can help optimize resource usage.

Here is an example using the low-level API:

from llama_index.core.storage.docstore import SimpleDocumentStore
from llama_index.core.storage.index_store import SimpleIndexStore
from llama_index.core.vector_stores import SimpleVectorStore
from llama_index.core.node_parser import SentenceSplitter

# Load documents
documents = SimpleDirectoryReader("../../examples/data/paul_graham").load_data()

# Create parser and parse document into nodes
parser = SentenceSplitter()
nodes = parser.get_nodes_from_documents(documents)

# Create storage context using default stores
storage_context = StorageContext.from_defaults(
    docstore=SimpleDocumentStore(),
    vector_store=SimpleVectorStore(),
    index_store=SimpleIndexStore(),
)

# Add nodes to docstore
storage_context.docstore.add_documents(nodes)

# Build index
index = VectorStoreIndex(nodes, storage_context=storage_context)

# Save index
index.storage_context.persist(persist_dir="<persist_dir>")

These adjustments help manage memory and processing resources more effectively, especially when dealing with large datasets or limited system resources. Additionally, the VectorStoreIndex class provides asynchronous methods for embedding and adding nodes, which can further optimize resource usage by leveraging asynchronous processing [1][2][3][4].

To continue talking to Dosu, mention @dosu.

logan-markewich commented 2 months ago

Its hard to say without seeing your code

If you are using local embeddings, you'll want to set the batch size. For example HuggingFaceEmbedding(..., embed_batch_size=2)

If you are indexing a lot data, you might want to use a dedicated vector store integration rather than using the default in-memory option