microsoft / semantic-kernel

Integrate cutting-edge LLM technology quickly and easily into your apps
https://aka.ms/semantic-kernel
MIT License
21.94k stars 3.27k forks source link

.Net: AzureCognitiveSearchMemoryRecord EmbeddingField is not configurable #3014

Closed cgjedrem closed 8 months ago

cgjedrem commented 1 year ago

Describe the bug

During Azure cognitive search search connection and search in the collection the vector embedding name cannot be configured. In Azure Cognitive Search the Embeddings Name is "contentVector" while in Semantic Kernel the embedding vector name is set to public const string EmbeddingField = "Embedding"; from namespace Microsoft.SemanticKernel.Connectors.Memory.AzureCognitiveSearch; AzureCognitiveSearchMemoryStore line 167 GetNearestMatches

SearchQueryVector vectorQuery = new()
{
    KNearestNeighborsCount = limit,
    Fields = { AzureCognitiveSearchMemoryRecord.EmbeddingField },
    Value = MemoryMarshal.TryGetArray(embedding, out var array) && array.Count == embedding.Length ? array.Array! : embedding.ToArray(),
};

My code

            builder.WithMemoryStorage(new AzureCognitiveSearchMemoryStore("secret",
                "secret"));
            _kernel = builder.Build();
....
            var memories = _kernel.Memory.SearchAsync(collections[1], question, 5);
            await foreach (var mem in memories)
            {
                answers.Add(mem.Metadata.Text);

            }

which on all returns an error

Azure.RequestFailedException: Unknown field 'Embedding' in vector field list.
Status: 400 (Bad Request)
ErrorCode: InvalidRequestParameter

Content:
{"error":{"code":"InvalidRequestParameter","message":"Unknown field 'Embedding' in vector field list.","details":[{"code":"UnknownField","message":"Unknown field 'Embedding' in vector field list."}]}}

Expected behavior I expect to be able to set the vector field list name in the configuration of the memory store so that the search goes as expected

Screenshots

Platform

matthewbolanos commented 1 year ago

@dmytrostruk, @awharrison-28 started working on a PR for this on the Python side. I'll share with you on the side so you can reference it.

Jeany0120 commented 11 months ago

Can you share how do you fix this problem?I run into same problem.

Agazoth commented 10 months ago

Hi @matthewbolanos

I'm trying to use Azure OpenAI Text Embedding Generations with Azure AI Search Memory Store as described in the example here: https://github.com/microsoft/semantic-kernel/blob/main/dotnet/samples/KernelSyntaxExamples/Example14_SemanticMemory.cs

I'm using 1.0.1 for SemanticKernel and Microsoft.SemanticKernel.Connectors.AzureAISearch, 1.0.1-alpha.

Since I use Azure AI Text Embedding, I changed line 35 from .WithOpenAITextEmbeddingGeneration("text-embedding-ada-002", TestConfiguration.OpenAI.ApiKey) to .WithAzureOpenAITextEmbeddingGeneration(myDeployment, myEndpoint, myApiKey, myModelId)

I'm using 1.0.1 for SemanticKernel and Microsoft.SemanticKernel.Connectors.AzureAISearch, 1.0.1-alpha.

Code runs fine until I loop over the results, which generates this exception: Azure.RequestFailedException: "Unknown field 'Embedding' in vector field list.".:

image

I tried digging into the code by examining the connectors.UnitTests.Memory and found, that no unit tests exists for AzureAI: image

I really want to use Azure AI Text Embedding and I would also like to contribute to the codebase, but how do I get started with debugging the functionality? I am not sure how to add a test for AzureAISearch to get me going.

dyardy commented 10 months ago

I have the same issue, with exception

Azure.RequestFailedException Unknown field 'Embedding' in vector field list. Status: 400 (Bad Request) ErrorCode: InvalidRequestParameter

Content: {"error":{"code":"InvalidRequestParameter","message":"Unknown field 'Embedding' in vector field list.","details":[{"code":"UnknownField","message":"Unknown field 'Embedding' in vector field list."}]}}

Headers: Cache-Control: no-cache,no-store Pragma: no-cache

    private async Task SearchMemoryAsync(ISemanticTextMemory memory, string query)
    {
        Console.WriteLine("\nQuery: " + query + "\n");

        var memoryResults = memory.SearchAsync("resume-index-ai", query, limit: 2, minRelevanceScore: 0.5);

        int i = 0;
        await foreach (MemoryQueryResult memoryResult in memoryResults)
        {
            //Console.WriteLine($"Result {++i}:");
            //Console.WriteLine("  URL:     : " + memoryResult.Metadata.Id);
            //Console.WriteLine("  Title    : " + memoryResult.Metadata.Description);
            //Console.WriteLine("  Relevance: " + memoryResult.Relevance);
            Console.WriteLine();
        }

        Console.WriteLine("----------------------");
    }

Within my index I have field named conentVector instead of 'Embedding' contentVector SingleCollection

How do I specify this field?

dyardy commented 9 months ago

Bump

nickamckenna commented 9 months ago

I'm also having this problem.

dmytrostruk commented 9 months ago

Thanks for reporting this issue, I will work on it immediately and will let you know as soon as it's fixed.

dmytrostruk commented 8 months ago

Hi All! The reason why it's failing with error Unknown field 'Embedding' in vector field list. is because Azure AI Search connector is implemented using predefined schema. It works for cases when you use this connector to ingest data first (it will create an index with SK predefined schema) and then read the data using the same schema.

However, it does not cover the case when index was created in other way than SK approach (e.g. from Azure portal), because the schema may be different. After further investigation and team reviews, it appeared that it's a complex problem, which need to be fixed not only for Azure AI Search, but for other connectors as well, and it should be fixed on abstraction level.

We are going to fix this problem in scope of major refactoring for memory connectors. Meanwhile, we prepared an example how you can use Azure AI Search today with SK, by importing Azure AI Search functionality as a Plugin: https://github.com/microsoft/semantic-kernel/blob/main/dotnet/samples/KernelSyntaxExamples/Example84_AzureAISearchPlugin.cs

nssidhu commented 8 months ago

I am also running into the same problem, hope will get the fix sooner

kdcllc commented 7 months ago

Hi All! The reason why it's failing with error Unknown field 'Embedding' in vector field list. is because Azure AI Search connector is implemented using predefined schema. It works for cases when you use this connector to ingest data first (it will create an index with SK predefined schema) and then read the data using the same schema.

However, it does not cover the case when index was created in other way than SK approach (e.g. from Azure portal), because the schema may be different. After further investigation and team reviews, it appeared that it's a complex problem, which need to be fixed not only for Azure AI Search, but for other connectors as well, and it should be fixed on abstraction level.

We are going to fix this problem in scope of major refactoring for memory connectors. Meanwhile, we prepared an example how you can use Azure AI Search today with SK, by importing Azure AI Search functionality as a Plugin: https://github.com/microsoft/semantic-kernel/blob/main/dotnet/samples/KernelSyntaxExamples/Example84_AzureAISearchPlugin.cs

This is a viable alternative to get the issue passed by. Thanks @dmytrostruk

nunomsr commented 6 months ago

Hi all! Can you please elaborate how to use this plugin as a solution? Should I import this plugin to the app or must I adapt the example you showed to the code? Complete noob here so any hint would be much appreciated.

Hi All! The reason why it's failing with error Unknown field 'Embedding' in vector field list. is because Azure AI Search connector is implemented using predefined schema. It works for cases when you use this connector to ingest data first (it will create an index with SK predefined schema) and then read the data using the same schema. However, it does not cover the case when index was created in other way than SK approach (e.g. from Azure portal), because the schema may be different. After further investigation and team reviews, it appeared that it's a complex problem, which need to be fixed not only for Azure AI Search, but for other connectors as well, and it should be fixed on abstraction level. We are going to fix this problem in scope of major refactoring for memory connectors. Meanwhile, we prepared an example how you can use Azure AI Search today with SK, by importing Azure AI Search functionality as a Plugin: https://github.com/microsoft/semantic-kernel/blob/main/dotnet/samples/KernelSyntaxExamples/Example84_AzureAISearchPlugin.cs

This is a viable alternative to get the issue passed by. Thanks @dmytrostruk

dmytrostruk commented 6 months ago

Should I import this plugin to the app or must I adapt the example you showed to the code?

@nunomsr If you configured your index in Azure AI Search and you have similar problem with predefined schema and Embedding field, it's better to use the code which is provided in example above, so it will allow you to use custom schema and bypass current limitations. Let me know if any further assistance is needed. Thanks!

Dexter-Codes commented 6 months ago

@dmytrostruk the example link that u shared above => https://github.com/microsoft/semantic-kernel/blob/main/dotnet/samples/KernelSyntaxExamples/Example84_AzureAISearchPlugin.cs. This is not functional anymore. Can you please provide a reference or an example for this issue. Thanks

dmytrostruk commented 6 months ago

@dmytrostruk the example link that u shared above => https://github.com/microsoft/semantic-kernel/blob/main/dotnet/samples/KernelSyntaxExamples/Example84_AzureAISearchPlugin.cs. This is not functional anymore. Can you please provide a reference or an example for this issue. Thanks

Here is a new link: https://github.com/microsoft/semantic-kernel/blob/c545c7d774176d11964c81e173776232a2ae2f20/dotnet/samples/Concepts/Search/MyAzureAISearchPlugin.cs

snympi commented 4 months ago

@dmytrostruk any update on the feature enhancement needed to allow specifying a matching schema to what Azure AI Search natively uses when ingesting and embedding data?

I just ran into this issue and had to debug to come to the same conclusion as previous posters that SK creates Embedding field and AI Search creates contentVector. Based on this I assume that your preferred use of SK is to do the file loading, chunking and embedding from SK itself. Is this method comparable in quality to using AI Search chunking / embedding?

dmytrostruk commented 4 months ago

any update on the feature enhancement needed to allow specifying a matching schema to what Azure AI Search natively uses when ingesting and embedding data?

@snympi We are working on new design for vector abstractions that will allow to use any schema, including the one that Azure AI Search natively uses when ingesting and embedding data. By following link you can find new Azure AI Search implementation in our feature branch.

Based on this I assume that your preferred use of SK is to do the file loading, chunking and embedding from SK itself.

With new design it will be possible to do file loading/chunking/embedding from SK or use already existing index from Azure AI Search to query data only.

Is this method comparable in quality to using AI Search chunking / embedding?

Integrated chunking in Azure AI Search allows you to chunk your documents by specific rules (e.g. pages or sentences), the same applies to SK version of TextChunker. I'm not sure if there are huge differences in terms of quality. As for embeddings, you can choose AI model you would like to use both on Azure AI Search and from code using Semantic Kernel, and both approaches should produce the same outcome.

I think that doing chunking and embedding from the code should provide you more flexibility and control. In the code you can always provide your custom chunking logic based on nature of your documents or use local/custom AI model for embedding generation if needed.