Closed LikhithRishi closed 1 month ago
To create embeddings for a document, store them in Elasticsearch, and share these embeddings with other users, you can use the ElasticsearchEmbedding
class from the LlamaIndex framework. Below is a step-by-step guide on how to achieve this, including using Docker to simplify the setup.
First, you need to set up an Elasticsearch instance using Docker. Create a docker-compose.yml
file with the following content:
version: '7.9'
services:
elasticsearch:
image: docker.elastic.co/elasticsearch/elasticsearch:7.9.3
container_name: elasticsearch
environment:
- discovery.type=single-node
- ELASTIC_USERNAME=elastic
- ELASTIC_PASSWORD=changeme
ports:
- 9200:9200
networks:
- esnet
networks:
esnet:
driver: bridge
Run the following command to start Elasticsearch:
docker-compose up -d
Install the necessary Python packages:
pip install llama-index-vector-stores-elasticsearch
pip install llama-index-embeddings-elasticsearch
pip install llama-index
Use the ElasticsearchEmbedding
class to create embeddings and store them in Elasticsearch. Below is an example script:
import os
from llama_index.embeddings.elasticsearch import ElasticsearchEmbedding
from llama_index.vector_stores.elasticsearch import ElasticsearchStore
from llama_index.core import StorageContext, VectorStoreIndex, Settings
# Get credentials and create embeddings
host = os.environ.get("ES_HOST", "localhost:9200")
username = os.environ.get("ES_USERNAME", "elastic")
password = os.environ.get("ES_PASSWORD", "changeme")
index_name = os.environ.get("INDEX_NAME", "your-index-name")
model_id = os.environ.get("MODEL_ID", "your-model-id")
embeddings = ElasticsearchEmbedding.from_credentials(
model_id=model_id, es_url=host, es_username=username, es_password=password
)
# Set global settings
Settings.embed_model = embeddings
Settings.chunk_size = 512
# Usage with Elasticsearch vector store
vector_store = ElasticsearchStore(
index_name=index_name, es_url=host, es_user=username, es_password=password
)
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex.from_vector_store(
vector_store=vector_store,
storage_context=storage_context,
)
query_engine = index.as_query_engine()
# Example text to generate embeddings for
text = "This is a sample document."
# Generate embeddings and store them
response = query_engine.query(text)
print("Embedding stored successfully.")
To share the embeddings with other users, you can provide them access to the Elasticsearch instance. They can then query the stored embeddings using the Elasticsearch API.
from elasticsearch import Elasticsearch
# Create Elasticsearch connection
es_connection = Elasticsearch(
hosts=["http://localhost:9200"],
basic_auth=("elastic", "changeme")
)
# Query the stored embeddings
response = es_connection.search(index="your-index-name", body={"query": {"match_all": {}}})
for hit in response['hits']['hits']:
print(hit["_source"])
This setup allows you to create, store, and share embeddings efficiently using Elasticsearch and Docker. Additionally, you can use OpenSearch as an alternative to Elasticsearch, which is supported by the LlamaIndex framework. To use OpenSearch, you need to install the opensearch-py
package and modify the script to use OpensearchVectorClient
and OpensearchVectorStore
classes from the LlamaIndex framework [1][2][3][4].
To continue talking to Dosu, mention @dosu.
We have full docs for elastic search https://docs.llamaindex.ai/en/stable/examples/vector_stores/ElasticsearchIndexDemo/?h=elastic
Question Validation
Question
We want to create embeddings for a document and store it in elasticsearch and like to share those embeddings to other user to decrease time consumption.Let me know is there any solution to share these embeddings either in docker or any other alternative solution.