Open FestinaIsm opened 1 month ago
Hey @FestinaIsm! I'm here to help you with any bugs, questions, or contributions you have. Let's work together to tackle this issue!
I found a similar issue where the "embedding" field was missing in the index when creating a vector store with AzureAISearchVectorStore
in version 0.11.10. The problem might be due to the embedding_field_key
parameter not being correctly utilized or mapped in the actual implementation of the AzureAISearchVectorStore
.
In the internal implementation of AzureAISearchVectorStore
, the embedding_field_key
should be mapped correctly to the index schema. The relevant part of the code shows that the embedding
field is configured as a SearchField
with the appropriate type and properties:
fields = [
SimpleField(name=self._field_mapping["id"], type="Edm.String", key=True),
SearchableField(
name=self._field_mapping["chunk"],
type="Edm.String",
analyzer_name=self._language_analyzer,
),
SearchField(
name=self._field_mapping["embedding"],
type=SearchFieldDataType.Collection(SearchFieldDataType.Single),
searchable=True,
vector_search_dimensions=self._embedding_dimensionality,
vector_search_profile_name=self._vector_profile_name,
),
SimpleField(name=self._field_mapping["metadata"], type="Edm.String"),
SimpleField(
name=self._field_mapping["doc_id"], type="Edm.String", filterable=True
),
]
This indicates that the embedding
field should be present in the index schema if the embedding_field_key
is correctly mapped. If the "embedding" field is still missing, it might be due to an issue in how the index is being created or updated. Ensure that the index creation process is correctly executed and that the embedding_field_key
is properly utilized when defining the index schema [1][2][3].
Additionally, there is documentation in the codebase that explains the expected behavior and usage of the embedding_field_key
in the AzureAISearchVectorStore
[3][4][5].
To continue talking to Dosu, mention @dosu.
tl;dr: the retrievable
flag for vector fields (i.e. "embedding" in Llama Index) has changed to default False
in the latest Azure SDK
After some digging, the issue seems to be introduced by changes in the latest Azure SDK schema:
So if one wants to force the embedding field retrievable, we probably have to override the _create_index
method:
def _create_index(self, index_name: Optional[str]) -> None:
...
fields = [
SimpleField(name=self._field_mapping["id"], type="Edm.String", key=True),
SearchableField(
name=self._field_mapping["chunk"],
type="Edm.String",
analyzer_name=self._language_analyzer,
),
SearchField(
name=self._field_mapping["embedding"],
hidden=False, # Force the `SearchField` to be retrievable <------------------------
type=SearchFieldDataType.Collection(SearchFieldDataType.Single),
searchable=True,
vector_search_dimensions=self._embedding_dimensionality,
vector_search_profile_name=self._vector_profile_name,
),
SimpleField(name=self._field_mapping["metadata"], type="Edm.String"),
SimpleField(
name=self._field_mapping["doc_id"], type="Edm.String", filterable=True
),
]
...
Bug Description
I'm creating the
vector_store = AzureAISearchVectorStore(
search_or_index_client=index_client,
filterable_metadata_field_keys=metadata_fields, index_name=index_name,
index_management=IndexManagement.CREATE_IF_NOT_EXISTS,
id_field_key="id",
chunk_field_key="content",
embedding_field_key="embedding",
metadata_string_field_key="metadata", doc_id_field_key="doc_id", embedding_dimensionality=1536, language_analyzer="en.lucene", vector_algorithm_type="exhaustiveKnn" )
and initializing the index_client, together with the
storage_context = StorageContext.from_defaults(vector_store=vector_store) Settings.llm = llm Settings.embed_model = embed_model
They all are initiliazed correctly and the index is created. However, the "embedding" field is not present in the index, meaning that the vector embedding is missing alltogether.
Any ideas? Thank you!
Version
0.11.10
Steps to Reproduce
there is no error, it's just that the vector embeddings are not being created
Relevant Logs/Tracbacks
No response