Closed Subham0793 closed 1 week ago
Hey @Subham0793, great to see you diving into LlamaIndex again! Hope all's been well since our last chat.
The issue where the embedding field is being saved as a "number" datatype instead of "dense_vector" in Elasticsearch could be due to a few key reasons:
Index Creation and Mapping: Ensure that the index was created with the correct mappings for the embedding field to be recognized as "dense_vector". If the index already exists without the correct mappings, Elasticsearch might interpret the embedding array as individual numeric fields. The mapping for the embedding field should explicitly define it as a "dense_vector" and specify the dimensions.
Elasticsearch Version Compatibility: Verify that the version of Elasticsearch you're using supports the "dense_vector" datatype. The "dense_vector" type and certain similarity options are not available in older versions of Elasticsearch.
To diagnose and resolve the issue, please check the following:
The code responsible for setting up the index and mappings is crucial here. Ensure that the index settings in your setup correctly specify the "dense_vector" type for the embedding field, along with the correct dimensions (dims
). This setup is typically handled in the _create_index_if_not_exists
method or similar in your ElasticsearchStore configuration.
If the issue persists after verifying these points, providing more specific details about your Elasticsearch setup and any error messages you're encountering would be helpful for further troubleshooting.
No idea. Feel free to make a PR if the code can be improved https://github.com/run-llama/llama_index/blob/32e3c1f6f3acd28ec48b868568d3337b783e56d7/llama-index-integrations/vector_stores/llama-index-vector-stores-elasticsearch/llama_index/vector_stores/elasticsearch/base.py#L355
Question Validation
Question
Following is the code snippet i am using for indexing data into Elastic Store
` from llama_index.vector_stores.elasticsearch import ElasticsearchStore from llama_index.core import ( VectorStoreIndex, ServiceContext, StorageContext, Settings )
self.node_parser = SentenceWindowNodeParser.from_defaults( window_size=3, window_metadata_key="window", original_text_metadata_key="original_text", )
self.service_context = ServiceContext.from_defaults( llm=None, embed_model=self.embedding, # azure open ai embedding of 512 dimensions )
self.document_store = ElasticsearchStore( es_url=self.elasticsearch_url, index_name=index_id, ) self.storage_context = StorageContext.from_defaults( vector_store=self.document_store, )
self.index = VectorStoreIndex.from_vector_store( vector_store=self.document_store, storage_context=self.storage_context, service_context=self.service_context, )
sentence_nodes = self.node_parser.get_nodes_from_documents(docs) # docs is of List[Document] type
self.index = VectorStoreIndex( sentence_nodes, service_context=self.service_context, storage_context=self.storage_context )
`
When i am indexing the data, its taking embedding field as number data type. Because of this I am unable to perform the retrieve operation as it is expecting a dense vector field.
What can be the reason that after indexing, the embedding field is getting saved as number datatype ?