langchain-ai / langchain

🦜🔗 Build context-aware reasoning applications
https://python.langchain.com
MIT License
88.82k stars 13.97k forks source link

Semantic configuration is not created for Azure AI Search index using Langchain community. #20549

Open nachiketlanjewar-acc opened 2 months ago

nachiketlanjewar-acc commented 2 months ago

Checked other resources

Example Code

from langchain.vectorstores.azuresearch import AzureSearch
from azure.search.documents.indexes.models import (
    FreshnessScoringFunction,
    FreshnessScoringParameters,
    ScoringProfile,
    SearchableField,
    SearchField,
    SearchFieldDataType,
    SimpleField,
    TextWeights,
    SemanticConfiguration,
    SemanticPrioritizedFields,
    SemanticField
)

 fields = [
                SimpleField(
                    name="id",
                    type=SearchFieldDataType.String,
                    key=True,
                    filterable=True,
                ),
                SearchableField(
                    name="header1",
                    type=SearchFieldDataType.String,
                    searchable=True,
                ),
                 SearchableField(
                    name="header2",
                    type=SearchFieldDataType.String,
                    searchable=True,
                ), SearchableField(
                    name="header3",
                    type=SearchFieldDataType.String,
                    searchable=True,
                ),
                SearchableField(
                    name="content",
                    type=SearchFieldDataType.String,
                    searchable=True,
                ),
                SearchField(
                    name="content_vector",
                    type=SearchFieldDataType.Collection(SearchFieldDataType.Single),
                    searchable=True,
                    vector_search_dimensions=len(aoai_embeddings.embed_query("Text")),
                    vector_search_profile_name="myExhaustiveKnnProfile",
                ),
                SearchableField(
                    name="metadata",
                    type=SearchFieldDataType.String,
                    searchable=True,
                ),
            ]

 index_name: str = vector_store_index
    # Adding a custom scoring profile with a freshness function
    sc_name = "csrd_scoring_profile"
    sc = ScoringProfile(
        name=sc_name,
        text_weights=TextWeights(weights={
            "header1": 10,
            "header2": 9,
            "content": 8,
            "content_vector": 8
            }),
        function_aggregation="sum"
    )

 semantic_configuration_name = 'my_semantic_configuration'
    semantic_config = SemanticConfiguration(
        name=semantic_configuration_name,
        prioritized_fields=SemanticPrioritizedFields(
            title_field=SemanticField(field_name='header2'),
            content_fields=[SemanticField(field_name='content')],
            keywords_fields=None,
        )
    )

    vector_store: AzureSearch = AzureSearch(
        search_type='semantic_hybrid',
        scoring_profiles=[sc],
        default_scoring_profile=sc_name,
        semantic_configurations=[semantic_config],
        semantic_configuration_name=semantic_configuration_name,
        azure_search_endpoint=vector_store_address,
        azure_search_key=vector_store_password,
        index_name=index_name,
        embedding_function=aoai_embeddings.embed_query,
        fields=fields,
    ) 

Error Message and Stack Trace (if applicable)

There is no error but semantic configuration is not created for index.

Description

Semantic configuration is not created for Azure AI Search index using Langchain community if both semantic config name and semantic configuration is provided.

When I checked in AzureSearch.py, I found below snippet which creates the semantic configuration.

# Create the semantic settings with the configuration
        semantic_search = None
        if semantic_configurations is None and semantic_configuration_name is not None:
            semantic_configuration = SemanticConfiguration(
                name=semantic_configuration_name,
                prioritized_fields=SemanticPrioritizedFields(
                    content_fields=[SemanticField(field_name=FIELDS_CONTENT)],
                ),
            )
            semantic_search = SemanticSearch(configurations=[semantic_configuration])

         # Create the search index with the semantic settings and vector search
        index = SearchIndex(
            name=index_name,
            fields=fields,
            vector_search=vector_search,
            semantic_search=semantic_search,
            scoring_profiles=scoring_profiles,
            default_scoring_profile=default_scoring_profile,
            cors_options=cors_options,
        )
        index_client.create_index(index)

if you observe, it create semantic config if semantic configuration is None and semantic configuration is not None only. else condition is not specified if both configuration and configuration name is present.

System Info

System Information

OS: Windows OS Version: 10.0.20348 Python Version: 3.11.6 (tags/v3.11.6:8b6ee5b, Oct 2 2023, 14:57:12) [MSC v.1935 64 bit (AMD64)]

Package Information

langchain_core: 0.1.27 langchain: 0.1.8 langchain_community: 0.0.24 langsmith: 0.1.10 langchain_openai: 0.0.8 langchainhub: 0.1.14

Packages not installed (Not Necessarily a Problem)

The following packages were not found:

langgraph langserve

spike-spiegel-21 commented 2 months ago
        if semantic_configurations:

            if not isinstance(semantic_configurations, list):
                semantic_configurations = [semantic_configurations]

            semantic_search = SemanticSearch(
                configurations=semantic_configurations,
                default_configuration_name=semantic_configuration_name,
            )

        elif semantic_configuration_name:

            # use default semantic configuration
            semantic_configuration = SemanticConfiguration(
                name=semantic_configuration_name,
                prioritized_fields=SemanticPrioritizedFields(
                    content_fields=[SemanticField(field_name=FIELDS_CONTENT)],
                ),
            )
            semantic_search = SemanticSearch(configurations=[semantic_configuration])

        else:
            # Don't use semantic search
            semantic_search = None

This is the updated code of the latest branch. Please update your langchain_community