Closed Vybhav448 closed 1 month ago
Hey @Vybhav448, good to see you back! Ready to dive into some hybrid search mysteries with LlamaIndex? 🕵️♂️🦙
To enable hybrid search using the llama-index-vector-stores-elasticsearch
in LlamaIndex, you need to set the query mode to VectorStoreQueryMode.HYBRID
when performing a query. This mode combines both text-based search and vector similarity search. Here's an example of how you can do this:
from llama_index.vector_stores.elasticsearch import ElasticsearchStore, VectorStoreQuery, VectorStoreQueryMode
# Initialize the vector store
vector_store = ElasticsearchStore(
index_name=table_name,
es_url="http://localhost:9200"
)
# Create a VectorStoreQuery object with mode set to HYBRID
query = VectorStoreQuery(
query_embedding=[1.0, 0.0, 0.0], # Your query embedding
query_str="your search text here", # Your query string for text-based filtering
similarity_top_k=10, # Number of top similar items to retrieve
mode=VectorStoreQueryMode.HYBRID # Set the query mode to HYBRID
)
# Perform the query
result = vector_store.query(query)
# `result` will contain the top k most similar nodes based on the hybrid query
In this example, the query.mode
is checked to see if it's set to VectorStoreQueryMode.HYBRID
. If it is, the Elasticsearch query is constructed to perform both a text search (using the match
query on the specified text_field
) and a vector similarity search (using the knn
query on the specified vector_field
). The results are then ranked using the Reciprocal Rank Fusion (RRF) method indicated by "rank": {"rrf": {}}
.
This information is based on the code found in the llama-index-integrations/vector_stores/llama-index-vector-stores-elasticsearch/llama_index/vector_stores/elasticsearch/base.py file.
Additionally, the ElasticsearchStore
class in the llama_index.vector_stores.elasticsearch
module does support the VectorStoreQueryMode.HYBRID
mode for queries. This is evidenced by the test function test_add_to_es_and_hybrid_query
in the llama-index-integrations/vector_stores/llama-index-vector-stores-elasticsearch/tests/test_vector_stores_elasticsearch.py file.
I hope this helps! If you have any other questions, feel free to ask.
To perform hybrid search, you need to define the retrieval_strategy parameter in the ElasticsearchStore class and create the index and query engine .
from llama_index.vector_stores.elasticsearch import ElasticsearchStore
class RetrieverStrategyEnum(str, Enum):
DENSE = "dense"
SPARSE = "sparse"
BM25 = "bm25"
HYBRID = "hybrid"
strategy_instances = {
RetrieverStrategyEnum.DENSE: AsyncDenseVectorStrategy(), # default, K-nearest-neighbors, cosine similarity
RetrieverStrategyEnum.SPARSE: AsyncSparseVectorStrategy(),
RetrieverStrategyEnum.BM25: AsyncBM25Strategy(),
RetrieverStrategyEnum.HYBRID: AsyncDenseVectorStrategy(hybrid=True)
}
vector_store = ElasticsearchStore(
index_name=index_name,
es_cloud_id=os.getenv("es_cloud_id"),
es_user=os.getenv("es_user"),
es_password=os.getenv("es_password"),
retrieval_strategy=strategy_instances[RetrieverStrategyEnum.HYBRID]
)
query_engine = vector_store.index.as_query_engine()
result = query_engine.query(query)
Question Validation
Question
Using llama-index-vector-stores-elasticsearch, with index, how to enable hybrid search? We are using elastic search locally, which is running successfully, we just need to enable hybrid search, for which there was no documentation or was information provided anywhere. Below is how we are initializing the vectorstore.
from llama_index.vector_stores.elasticsearch import ElasticsearchStore vector_store = ElasticsearchStore( index_name=table_name, es_url="http://localhost:9200" )