Weaviate does not work - Githubissues

heinsenberg82 commented 1 month ago

Describe the bug

I opened a similar issue in the Semantic Kernel repository (it was one of the reasons I came to this repository). https://github.com/microsoft/semantic-kernel/issues/8934

I can't use Weaviate Vector Store with Google Vertex AI (and, I suspect, other integrations with Weaviate may not be working either).

This is my code:

        var provider = new VertexAIProvider(new VertexAIConfiguration
        {
            GoogleCredential = GoogleCredential.FromFile("D:\\code\\my-google-cloud-project.json"),
            Location = "us-central1",
        });
        var embeddingModel = new VertexAIEmbeddingModel(provider, id: "text-multilingual-embedding-002");
        var llm = new VertexAIChatModel(provider, id: "gemini-1.5-pro-001");

        var weviateApiKey = "weaviate-api-key";
        var collection = "Test_Collection";
        WeaviateMemoryStore memoryStore = new("https://my-weaviate-endpoint.c0.us-east1.gcp.weaviate.cloud", weviateApiKey);
        var vectorDatabase = new WeaviateVectorDatabase(memoryStore);

        // Exeception is thrown here
        var vectorCollection = await vectorDatabase.AddDocumentsFromAsync<PdfPigPdfLoader>(
            embeddingModel, // Used to convert text to embeddings
            dimensions: 384, // Should be 384 for all-minilm
            dataSource: DataSource.FromUrl("https://canonburyprimaryschool.co.uk/wp-content/uploads/2016/01/Joanne-K.-Rowling-Harry-Potter-Book-1-Harry-Potter-and-the-Philosophers-Stone-EnglishOnlineClub.com_.pdf"),
            collectionName: "harrypotter", // Can be omitted, use if you want to have multiple collections
            textSplitter: null,
            behavior: AddDocumentsToDatabaseBehavior.JustReturnCollectionIfCollectionIsAlreadyExists);

        const string question = "What is Harry's Address?";
        var similarDocuments = await vectorCollection.GetSimilarDocuments(embeddingModel, question, amount: 5);
        // Use similar documents and LLM to answer the question
        var answers = llm.GenerateAsync(
            $"""
             Use the following pieces of context to answer the question at the end.
             If the answer is not in context then just say that you don't know, don't try to make up an answer.
             Keep the answer as short as possible.

             {similarDocuments.AsString()}

             Question: {question}
             Helpful Answer:
             """);

        await foreach (var answer in answers)
        {
            Console.WriteLine($"LLM answer: {answer}");
        }

I keep getting the same error:

Microsoft.SemanticKernel.HttpOperationException: Response status code does not indicate success: 401 (Unauthorized).
 ---> System.Net.Http.HttpRequestException: Response status code does not indicate success: 401 (Unauthorized).
   at System.Net.Http.HttpResponseMessage.EnsureSuccessStatusCode()
   at Microsoft.SemanticKernel.Http.HttpClientExtensions.SendWithSuccessCheckAsync(HttpClient client, HttpRequestMessage request, HttpCompletionOption completionOption, CancellationToken cancellationToken)
   --- End of inner exception stack trace ---
   at Microsoft.SemanticKernel.Http.HttpClientExtensions.SendWithSuccessCheckAsync(HttpClient client, HttpRequestMessage request, HttpCompletionOption completionOption, CancellationToken cancellationToken)
   at Microsoft.SemanticKernel.Http.HttpClientExtensions.SendWithSuccessCheckAsync(HttpClient client, HttpRequestMessage request, CancellationToken cancellationToken)
   at Microsoft.SemanticKernel.Connectors.Weaviate.WeaviateMemoryStore.ExecuteHttpRequestAsync(HttpRequestMessage request, CancellationToken cancel)
   at Microsoft.SemanticKernel.Connectors.Weaviate.WeaviateMemoryStore.DoesCollectionExistAsync(String collectionName, CancellationToken cancellationToken)
   at LangChain.Databases.SemanticKernel.SemanticKernelMemoryDatabase.IsCollectionExistsAsync(String collectionName, CancellationToken cancellationToken) in /_/src/SemanticKernel/src/SemanticKernelMemoryDatabase.cs:line 34
   at LangChain.Extensions.VectorDatabaseExtensions.AddDocumentsFromAsync[TLoader](IVectorDatabase vectorDatabase, IEmbeddingModel embeddingModel, Int32 dimensions, DataSource dataSource, String collectionName, ITextSplitter textSplitter, DocumentLoaderSettings loaderSettings, EmbeddingSettings embeddingSettings, AddDocumentsToDatabaseBehavior behavior, CancellationToken cancellationToken) in /_/src/Core/src/Extensions/VectorDatabaseExtensions.cs:line 42
   at Api.LangchainTest.Execute() in D:\code\Meu-Aluguel\Api\LangchainTest.cs:line 35
   at Program.<Main>$(String[] args) in D:\code\Meu-Aluguel\Api\Program.cs:line 13
   at Program.<Main>(String[] args)

I suspect that the Semantic Kernel library (responsible for the WeaviateMemoryStore class, on which this library is dependent) is not placing the necessary headers in requests managed by the Vector database classes. For instance, the Weaviate documentation (https://weaviate.io/developers/weaviate/model-providers/google/embeddings) says that, for the integration with Vertex AI to work, the Vertex AI API key must be passed in the request header in the X-Google-Vertex-Api-Key field. In the case of Open AI, it would be the X-OpenAI-Api-Key field.

Alternatively, would there be any way to use Weaviate with this Langchain library without going through the Semantic Kernel?

Steps to reproduce the bug

Execute my code

Expected behavior

No response

Screenshots

No response

NuGet package version

No response

Additional context

No response

danijerez commented 1 month ago

WeaviateVectorData lanchain dotnet uses the semantic kernel implementation, you will have the same problems, the library abstraction is complex and early, surely, they will end up solving these problems, there are many vector database alternatives, my recommendation is to try another one.

https://github.com/tryAGI/LangChain.Databases/blob/main/src/SemanticKernel/src/SemanticKernelMemoryDatabase.cs https://github.com/tryAGI/LangChain.Databases/blob/main/src/Weaviate/src/WeaviateVectorDatabase.cs

HavenDV commented 1 month ago

We can try to update all dependencies, and it might work if Microsoft already fixed this problem There are some problems with dependabot here, I think it doesn't update it for some reason

HavenDV commented 1 month ago

I updated all SemanticKernel dependencies, please try to use latest .dev version of LangChain.Databases.Weaviate

heinsenberg82 commented 1 month ago

I updated all SemanticKernel dependencies, please try to use latest .dev version of LangChain.Databases.Weaviate

Thanks for the feedback. Just upgraded all dependencies, LangChain.Databases.Weaviate is on 0.15.4-dev.5 now. Unfortunately the error persists.

heinsenberg82 commented 1 month ago

I imagine the problem is related to this open issue - https://github.com/microsoft/semantic-kernel/issues/6732

The issue was opened almost 4 months ago, and there is no sign of activity. It's disappointing, and gives the impression that Microsoft isn't paying the necessary attention to the Semantic Kernel library.

Digging through the Microsoft documentation, I also came across this article - https://learn.microsoft.com/en-us/semantic-kernel/concepts/vector-store-connectors/out-of-the-box-connectors/weaviate-connector?pivots=programming-language-csharp. The article doesn't even mention the WeaviateMemoryStore class (which has been abandoned, maybe?).

The article also does not show how to make a semantic search using Weaviate (or any vector database). There is another single article that addresses the subject (https://learn.microsoft.com/en-us/semantic-kernel/concepts/plugins/using-data-retrieval-functions-for-rag), but it contains only a single example with Azure Search. There simply is no documentation or example of a RAG search, with any other Vector Store whatsoever.

With so many Vector Stores currently unusable, I would venture to say that the library is currently useless for RAG purposes.

HavenDV commented 1 month ago

While reusing SemanticKernel and their working stuff looks like a very good idea, I generally go the other way - through automatic creation/maintenance of SDKs for many popular AI tools and their direct use. While the initial implementation of SDK is trivial, their long-term maintenance is not, when it needs to be monitored regularly. I found the OpenAPI specification for weaviate - https://weaviate.io/developers/weaviate/api/rest, so it is fully suitable for automation via AutoSDK - https://github.com/tryAGI/AutoSDK I think I will publish it in the next few days, I added it to the list, at the moment the initial creation of a new generated SDK takes about an hour, mainly for initial testing, after that it will be possible to implement it as a provider But of course, this is the hard way even so. But this is the way

tryAGI / LangChain.Databases

Weaviate does not work #53

Describe the bug

Steps to reproduce the bug

Expected behavior

Screenshots

NuGet package version

Additional context