microsoft / kernel-memory

RAG architecture: index and query any data using LLM and natural language, track sources, show citations, asynchronous memory patterns.
https://microsoft.github.io/kernel-memory
MIT License
1.35k stars 252 forks source link

KernelMemory: Include tags in answer's "RelevantSources" #247

Closed martinoss closed 6 months ago

martinoss commented 6 months ago

Context / Scenario

I load text and data from different sources into memory. These are PDF's, text files but also parsed website data. For every webpage, I already have a plain-text representation and know the Url for it, but I don't need to fetch it because I get the data from an export of Adobe Experience Manager.

In a chat application, I would like to render buttons that point to relevant web sources (url's).

The problem

When importing data, I add a tag containing the source (Url) for a webpage. But unfortunately, RelevantSources does not return the tag information.

As workaround, I encode the URL to base64, append ".txt" and use it as file path which is quite a hack.

Proposed solution

Would be very powerful to get tags as part of the relevant sources when asking memory.

Importance

None

dluc commented 6 months ago

hi @martinoss the service returns tags and other details, which you should see also in the MemoryAnswer object if you're using C# WebClient.

Here's an example of a response, after using the ImportWebPageAsync API to upload a web page:

{
    "question": "which storage engines can I use with Kernel Memory?",
    "noResult": false,
    "text": "The storage engines you can use with Kernel Memory include Azure AI Search, Elasticsearch, Postgres, Qdrant, Redis, In memory KNN, and On disk KNN.",
    "relevantSources": [
        {
            "link": "default/doc001/c8e2d982611649fba242c06c8ee1b37f",
            "index": "default",
            "documentId": "doc001",
            "fileId": "c8e2d982611649fba242c06c8ee1b37f",
            "sourceContentType": "text/x-uri",
            "sourceName": "content.url",
            "sourceUrl": "https://microsoft.github.io/kernel-memory/",
            "partitions": [
                {
                    "text": "Storage engines Azure AI Search, Chroma, DuckDB, Kusto, Milvus, MongoDB, Pinecone, Postgres, Qdrant, Redis, SQLite, Weaviate Azure AI Search, Elasticsearch, Postgres, Qdrant, Redis, In memory KNN, On disk KNN    and features available only in Kernel Memory:  RAG (Retrieval Augmented Generation) RAG sources lookup Summarization Security filters (filter memory by users and groups) Long running ingestion, large documents, with retry logic and durable queues Custom tokenization Document storage OCR via Azure Document Intelligence LLMs (Large Language Models) with dedicated tokenization Cloud deployment OpenAPI Custom storage schema (partially implemented/work in progress) Short Term Memory (partially implemented/work in progress)(*) Partially implemented and/or work in progress.    Topics   Quickstart: test KM in few minutes Memory service, web clients and plugins Memory API, memory ingestion and information retrieval KM Extensions: vector DBs, AI models, Data formats, Orchestration, Content storage Embedding serverless memory in .NET apps Security, service and users How-to guides, customizing KM and examples Concepts, KM glossary KM packages       Edit this page",
                    "relevance": 0.8469289,
                    "lastUpdate": "2024-01-08T23:45:15-08:00",
                    "tags": {
                        "__document_id": [
                            "doc001"
                        ],
                        "__file_type": [
                            "text/x-uri"
                        ],
                        "__file_id": [
                            "c8e2d982611649fba242c06c8ee1b37f"
                        ],
                        "__file_part": [
                            "36a60baa3976456f92850dc07adf43d0"
                        ],
                        "type": [
                            "manual"
                        ]
                    }
                },
                {
                    "text": "Overview | Kernel Memory                Skip to main content   Link      Menu      Expand       (external link)    Document      Search       Copy       Copied          Kernel Memory       Overview  QuickstartConfigurationStart the servicePython exampleC# exampleJava exampleJavaScript exampleBash and Curl examples  ServiceArchitectureConfigurationOpenAPI - Web APIWeb ClientSK Plugin  Memory APIStore memoryAnswer questions (RAG)Search memory  Extensions  Memory DBs Azure AI Search  Qdrant  PostgreSQL  Elastic Search  Redis  Simple memory   AI Azure OpenAI  OpenAI  LLama   Data formats Azure AI Document Intelligence   Orchestration Azure Queues  RabbitMQ  Simple queues   Content storage Azure Blobs  Simple storage   Serverless (.NET)ComponentsVolatile memoryKernel BuilderSK Plugin  SecuritySecurity FiltersService API Keys  How-to guidesPartitioning & chunkingCustom promptsCustom pipelinesHugging Face models  ConceptsIndexDocumentMemoryTagLLMEmbeddingCosine SimilarityVector SearchTokensF.A.Q.Packagesupload-file.\nsh   KM on GitHub     KM on Discord                     GitHub     Discord           Kernel Memory    Kernel Memory (KM) is a multi-modal AI Service specialized in the efficient indexing of documents and information through custom continuous data pipelines, with support for Retrieval Augmented Generation (RAG), synthetic memory, prompt engineering, and custom semantic memory processing. KM supports PDF and Word documents, PowerPoint presentations, Images, Spreadsheets and more, extracting information and generating memories by leveraging Large Language Models (LLMs), Embeddings and Vector storage.\nUtilizing advanced embeddings, LLMs and prompt engineering, the system enables Natural Language querying for obtaining answers from the information stored, complete with citations and links to the original sources.  Kernel Memory is designed for seamless integration with any programming language, providing a web service that can also be consumed as an OpenAPI endpoint for ChatGPT, web clients ready to use, and a Plugin for Microsoft Copilot and Semantic Kernel.   Kernel Memory (KM) and Semantic Memory (SM)  Semantic Memory (SM) is a library for C#, Python, and Java that wraps direct calls to databases and supports vector search. It was developed as part of the Semantic Kernel (SK) project and serves as the first public iteration of long-term memory.\nThe core library is maintained in three languages, while the list of supported storage engines (known as “connectors”) varies across languages. Kernel Memory (KM) is a service built on the feedback received and lessons learned from developing Semantic Kernel (SK) and Semantic Memory (SM). It provides several features that would otherwise have to be developed manually, such as storing files, extracting text from files, providing a framework to secure users’ data, etc. The KM codebase is entirely in .NET, which eliminates the need to write and maintain features in multiple languages. As a service, KM can be used from any language, tool, or platform, e.g. browser extensions and ChatGPT assistants. Here’s a few notable differences:    Feature Semantic Memory Kernel Memory     Data formats Text only Web pages, PDF, Images, Word, PowerPoint, Excel, Markdown, Text, JSON, more being added   Search Cosine similarity Cosine similarity, Hybrid search with filters, AND/OR conditions   Language support C#, Python, Java Any language, command line tools, browser extensions, low-code/no-code apps, chatbots, assistants, etc. Storage engines Azure AI Search, Chroma, DuckDB, Kusto, Milvus, MongoDB, Pinecone, Postgres, Qdrant, Redis, SQLite, Weaviate Azure AI Search, Elasticsearch, Postgres, Qdrant, Redis, In memory KNN, On disk KNN    and features available only in Kernel Memory:",
                    "relevance": 0.81711054,
                    "lastUpdate": "2024-01-08T23:45:15-08:00",
                    "tags": {
                        "__document_id": [
                            "doc001"
                        ],
                        "__file_type": [
                            "text/x-uri"
                        ],
                        "__file_id": [
                            "c8e2d982611649fba242c06c8ee1b37f"
                        ],
                        "__file_part": [
                            "fc75380e8f354b3389df8ce7dc5a1225"
                        ],
                        "type": [
                            "manual"
                        ]
                    }
                }
            ]
        }
    ]
}
martinoss commented 6 months ago

Hi @dluc thank you! My fault, I haven't drilled into partitions, sorry for that.