microsoft / chat-copilot

MIT License
1.92k stars 655 forks source link

Chat memories are not deleted when chat session is deleted #946

Open jonepo opened 2 months ago

jonepo commented 2 months ago

Describe the bug I noticed that in static class ISemanticMemoryClientExtensions in extension method RemoveChatMemoriesAsync, it is parsing the document ids wrong.

public static async Task RemoveChatMemoriesAsync(
        this IKernelMemory memoryClient,
        string indexName,
        string chatId,
        CancellationToken cancellationToken = default)
    {
        var memories = await memoryClient.SearchMemoryAsync(indexName, "*", 0.0F, chatId, cancellationToken: cancellationToken);
        var documentIds = memories.Results.Select(memory => memory.Link.Split('/').First()).Distinct().ToArray();
        var tasks = documentIds.Select(documentId => memoryClient.DeleteDocumentAsync(documentId, indexName, cancellationToken)).ToArray();

        Task.WaitAll(tasks, cancellationToken);
    }

The method above tries to get the document id for each result with memory.Link.Split('/').First(). As the link is actually in format of $"{index}/{documentId}/{fileId}", it ends up picking up the index name. The documentIds array variable will always only contain the index names. Because of that, the document deletion doesn't work properly.

I can create a pull request to fix this if you agree of the issue. Instead of using the first element, we must get the second element in the split array. I don't have an instance of chat-copilot running so I couldn't test it in practice, but by reading the code, it seems to be a clear case.

To Reproduce Steps to reproduce the behavior:

  1. Create a conversation
  2. Chat a few times to collect memories
  3. Confirm that the index contains memories for the session
  4. Delete the conversation (chat session)
  5. Check if the memories exist in the index

Expected behavior Memories for the deleted chat session are deleted.

Platform

tdechanterac commented 2 months ago

I had the same issue, but are you sure it's not the long term memory that still exists ? From what I understand from the documentation, this kind of memory should be always available.

jonepo commented 2 months ago

At least in this chat-copilot sample project, working and long-term memory are saved with the chat ID tag in the Kernel Memory. So they both are contained within a single conversation, which means there wouldn't be a point in keeping the long-term memory items when the conversation is deleted.

dipteshbosedxc commented 2 months ago

Same behaviour. The document persists in memory indefinitely. The RAG functionality is activated even with a new context (such as a new chat session) where the document hasn’t been uploaded. For instance, while my custom plugin operates correctly, it still references a PDF document for RAG purposes due to its retention in long-term memory.

image

jonepo commented 2 months ago

That sounds odd as I thought it always will set the chat session ID as the tag before saving it to the memory. That way it shouldn't return document chunks from other sessions as it filters with the current chat session ID. Maybe it's another bug.

dipteshbosedxc commented 2 months ago

You are correct. Seems like Chat Copilot app needs an upgrade. There is no way one can delete uploaded documents or clear the vector database => in this case TextFile

image

jonepo commented 2 months ago

Ah yes, you were talking about the global scoped documents. They indeed are kept there forever as the UI doesn't have the option to delete them. And as they are globally scoped, any chat session can use them as citations.

This issue I created is just related to the chat session scoped document, working, and long-term memory not being deleted when the session is deleted.