Embedding getting saved as "number" datatype in Elastic Search Store, instead of "dense_vector". What can be the reason behind it?

Subham0793 commented 3 months ago

Question Validation

[X] I have searched both the documentation and discord for an answer.

Question

Following is the code snippet i am using for indexing data into Elastic Store

` from llama_index.vector_stores.elasticsearch import ElasticsearchStore from llama_index.core import ( VectorStoreIndex, ServiceContext, StorageContext, Settings )

self.node_parser = SentenceWindowNodeParser.from_defaults( window_size=3, window_metadata_key="window", original_text_metadata_key="original_text", )

self.service_context = ServiceContext.from_defaults( llm=None, embed_model=self.embedding, # azure open ai embedding of 512 dimensions )
self.document_store = ElasticsearchStore( es_url=self.elasticsearch_url, index_name=index_id, ) self.storage_context = StorageContext.from_defaults( vector_store=self.document_store, )

self.index = VectorStoreIndex.from_vector_store( vector_store=self.document_store, storage_context=self.storage_context, service_context=self.service_context, )

sentence_nodes = self.node_parser.get_nodes_from_documents(docs) # docs is of List[Document] type

self.index = VectorStoreIndex( sentence_nodes, service_context=self.service_context, storage_context=self.storage_context )

`

When i am indexing the data, its taking embedding field as number data type. Because of this I am unable to perform the retrieve operation as it is expecting a dense vector field.

What can be the reason that after indexing, the embedding field is getting saved as number datatype ?

dosubot[bot] commented 3 months ago

Hey @Subham0793, great to see you diving into LlamaIndex again! Hope all's been well since our last chat.

The issue where the embedding field is being saved as a "number" datatype instead of "dense_vector" in Elasticsearch could be due to a few key reasons:

Index Creation and Mapping: Ensure that the index was created with the correct mappings for the embedding field to be recognized as "dense_vector". If the index already exists without the correct mappings, Elasticsearch might interpret the embedding array as individual numeric fields. The mapping for the embedding field should explicitly define it as a "dense_vector" and specify the dimensions.
Elasticsearch Version Compatibility: Verify that the version of Elasticsearch you're using supports the "dense_vector" datatype. The "dense_vector" type and certain similarity options are not available in older versions of Elasticsearch.

To diagnose and resolve the issue, please check the following:

The exact version of Elasticsearch you're using, as compatibility with the "dense_vector" datatype is necessary.
Any error messages or logs that appear during index creation or data insertion. These can provide clues on whether the index creation with the correct mappings was successful or if there were any errors.

The code responsible for setting up the index and mappings is crucial here. Ensure that the index settings in your setup correctly specify the "dense_vector" type for the embedding field, along with the correct dimensions (dims). This setup is typically handled in the _create_index_if_not_exists method or similar in your ElasticsearchStore configuration.

If the issue persists after verifying these points, providing more specific details about your Elasticsearch setup and any error messages you're encountering would be helpful for further troubleshooting.

Sources

#### About Dosu

This response is meant to be useful and save you time. It is **not meant to be a precise solution**, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot.

logan-markewich commented 3 months ago

No idea. Feel free to make a PR if the code can be improved https://github.com/run-llama/llama_index/blob/32e3c1f6f3acd28ec48b868568d3337b783e56d7/llama-index-integrations/vector_stores/llama-index-vector-stores-elasticsearch/llama_index/vector_stores/elasticsearch/base.py#L355

run-llama / llama_index