run-llama / llama_index

LlamaIndex is a data framework for your LLM applications
https://docs.llamaindex.ai
MIT License
36.8k stars 5.28k forks source link

[Question]: Using Qdrant with Hybrid Search using Fastembed's local_only in LlamaIndex #15683

Closed Kraeze23 closed 2 months ago

Kraeze23 commented 2 months ago

Question Validation

Question

Here's some context. First a class initiating a QdrantClient: ` from qdrant_client import QdrantClient, models

class HybridQdrantClient: DEFAULT_HOST = "http://localhost:6333" DENSE_MODEL = "BAAI/bge-large-en-v1.5" SPARSE_MODEL = "prithivida/Splade_PP_en_v1" CACHE_DIR = "/data/models/fastembed_cache/"

def __init__(self, collection_name):
    self.collection_name = collection_name
    # initialize Qdrant client
    self.qdrant_client = QdrantClient(self.DEFAULT_HOST)
    self.qdrant_client.set_model(embedding_model_name=self.DENSE_MODEL, cache_dir=self.CACHE_DIR)
    self.qdrant_client.set_sparse_model(embedding_model_name=self.SPARSE_MODEL, cache_dir=self.CACHE_DIR)

    if self.qdrant_client.collection_exists(collection_name=self.collection_name):
        self.qdrant_client.delete_collection(collection_name=self.collection_name)

    self.qdrant_client.create_collection(
        collection_name=self.collection_name,
        vectors_config={
            "text-dense": models.VectorParams(
                size=1024,  # Vector size is defined by used model
                distance=models.Distance.COSINE,
            ),
        },
        sparse_vectors_config={
            "text-sparse": models.SparseVectorParams()
        }
    )`

As you can see I've downloaded fastembed models and transferred to an air-gapped environment with no internet connection. Since Fastembed 1.2.7, there's a local_only option to read from a local cache.

The code for QdrantVectoreStore is standard: vector_store_ = QdrantVectorStore( client=hybrid_qdrant_client_.get_client(), collection_name=hybrid_qdrant_client_.collection_name, enable_hybrid=True, index_doc_id=True, fastembed_sparse_model )

As you can see the QdrantVectorStore has an option to add 'fastembed_sparse_model', but it only accepts a string and I do not see an option to do the same as in the Fastembed library, i.e. read from a local cache. Is there a way to use a local fastembed sparse model in QdrantVectorStore for Hybrid Qdrant Search?

logan-markewich commented 2 months ago

Just override the sparse functions

https://github.com/run-llama/llama_index/blob/1fc48b7b7219a9ca8ae1fee2a57d84af40fe91a3/llama-index-integrations/vector_stores/llama-index-vector-stores-qdrant/llama_index/vector_stores/qdrant/base.py#L147

logan-markewich commented 2 months ago

This is the default https://github.com/run-llama/llama_index/blob/1fc48b7b7219a9ca8ae1fee2a57d84af40fe91a3/llama-index-integrations/vector_stores/llama-index-vector-stores-qdrant/llama_index/vector_stores/qdrant/utils.py#L67

Kraeze23 commented 2 months ago

@logan-markewich , Thank you, I will implement your suggestion and post back with result for the benefit of others.

Kraeze23 commented 2 months ago

The solution works as intended. To specify, I've set the needed parameters in QdrantVectorStore:

vector_store = QdrantVectorStore(
                    client=hybrid_qdrant_client_.get_client(),
                    collection_name=hybrid_qdrant_client_.get_collection_name,
                    dense_config=rest.VectorParams(
                        size=1024,  # Vector size is defined by used model
                        distance=rest.Distance.COSINE,
                    ),
                    sparse_config=rest.SparseVectorParams(),
                    sparse_doc_fn=sparse_doc_fn(),
                    sparse_query_fn=sparse_doc_fn(),
                    enable_hybrid=True,
                    index_doc_id=True)

With the sparse functions provided as:

from llama_index.vector_stores.qdrant.utils import (
    fastembed_sparse_encoder,
    SparseEncoderCallable)

def sparse_doc_fn() -> SparseEncoderCallable:
    SPARSE_MODEL = "prithivida/Splade_PP_en_v1"
    CACHE_DIR = "/data/models/fastembed_cache/"
    return fastembed_sparse_encoder(model_name=SPARSE_MODEL, cache_dir=CACHE_DIR)

This allows a Qdrant hybrid search using the QdrantVectorStore, using the fastembed library and a local cache directory. @logan-markewich , Thank you again for your speedy response.