run-llama / llama_index

LlamaIndex is a data framework for your LLM applications
https://docs.llamaindex.ai
MIT License
36.15k stars 5.16k forks source link

[Question]: Node Ingestion in batches #16483

Open OvaisTariq95 opened 1 week ago

OvaisTariq95 commented 1 week ago

Question Validation

Question

When i try to ingest a singluar node using simple search everything works fine, but when i try to ingest the same node using hybrid search i.e dnse and sparse vector it causes my container's resources to sky rocket with it eventually crashing , Why is that ? also i tried sending the nodes in batches but that way docs are not being ingested in qdrant and suprisingly there are no errors that way either.

def run_sync_pipeline(nodes, pipeline): batch_size = 50 for batch in batch_nodes(nodes, batch_size): serialized_batch = [node.dict for node in batch] ingest_nodes.delay(nodes_batch=serialized_batch)

    pipeline.run(nodes=nodes)

    run_sync_pipeline(nodes, pipeline)

    below is my collection creation code
    def get_qdrant_pipeline():
qd_client = create_qdrant_client(location=QdrantConfig.QDRANT_LOCATION, api_key=QdrantConfig.QDRANT_API_KEY,
                                 url=QdrantConfig.QDRANT_URL)

collection_name = "hybrid_collection"
if not qd_client.collection_exists(collection_name):
    qd_client.create_collection(
        collection_name=collection_name,
        vectors_config={
            "text-dense": qmodels.VectorParams(
                size=384,
                distance=qmodels.Distance.COSINE,
            )
        },
        sparse_vectors_config={
            "text-sparse": qmodels.SparseVectorParams(
                index=qmodels.SparseIndexParams()
            )
        },
    )

vector_store = QdrantVectorStore(
    client=qd_client,
    collection_name=collection_name,
    enable_hybrid=True,
)
pipeline = IngestionPipeline(
    transformations=[
        OpenAIEmbedding(api_key=OpenAIConfig.OPENAI_API_KEY, dimensions=384,
                        embed_batch_size=5, model="text-embedding-3-small")
    ],
    vector_store=vector_store,
)
return pipeline
dosubot[bot] commented 1 week ago

Hey @OvaisTariq95! I'm here to help you with any bugs, questions, or contributions you have. Let's figure this out together!

The issue you're experiencing with resource usage during hybrid search ingestion might be due to the configuration or the way the batch processing is handled.

  1. Batch Size and Parallelism: Ensure that the batch_size and parallel parameters in QdrantVectorStore are set appropriately. A smaller batch_size might help reduce resource usage. You can also adjust the parallel parameter to control the number of parallel processes.

  2. Hybrid Search Configuration: Make sure that enable_hybrid=True is correctly set, which you have done. This ensures both dense and sparse vectors are used.

  3. Error Handling: Since there are no errors, it might be useful to add logging to track the progress and identify where the process might be failing silently.

  4. Resource Limits: Check the resource limits of your container. You might need to increase the allocated memory or CPU resources.

  5. OpenAI Embedding: The embed_batch_size is set to 5, which is reasonable, but you might want to experiment with this value to see if it affects performance.

  6. Qdrant Configuration: Ensure that the Qdrant server is configured to handle the load you're putting on it. This might involve tuning Qdrant's internal settings or scaling the server resources.

If these suggestions don't resolve the issue, consider profiling the resource usage to pinpoint the bottleneck.

To continue talking to Dosu, mention @dosu.

logan-markewich commented 1 week ago

Qdrant has a batch size parameter, feel free to set it

Hybrid search by default runs a sparse model locally, and if the batch size is large, it will use a lot of memory

The default is 64

QdrantVectorStore(..., batch_size=64)

OvaisTariq95 commented 1 day ago

you were righ reducing the batch size o 10 did the trick , but the time it is taking s too long any way i can optimize that? are there any algo which i can change in qdrant specifically for hybrid search