microsoft / semantic-kernel

Integrate cutting-edge LLM technology quickly and easily into your apps
https://aka.ms/semantic-kernel
MIT License
21.31k stars 3.13k forks source link

.Net : memoryPlugin["Save"] inserting NULL value at first position (SQLite) #6249

Open atiq-bs23 opened 4 months ago

atiq-bs23 commented 4 months ago

Describe the bug await kernel.InvokeAsync(memoryPlugin["Save"], new() { [Microsoft.SemanticKernel.Plugins.Memory.TextMemoryPlugin.InputParam] = "My family is from New York", [Microsoft.SemanticKernel.Plugins.Memory.TextMemoryPlugin.CollectionParam] = MemoryCollectionName, [Microsoft.SemanticKernel.Plugins.Memory.TextMemoryPlugin.KeyParam] = "info5", });

After executing this method, it is inserting a row with NULL value for the first time. I am using SQLite

To Reproduce Steps to reproduce the behavior:

  1. Use SQLite
  2. Try to insert any data
  3. After executing the method check SQLite database.
  4. You will get a blank NULL value row has inserted

Expected behavior It should insert a single row with the value pass to this method, but it is adding another row with NULL value.

Screenshots

image

Platform

westey-m commented 3 months ago

The issue seems to be caused by how the does collection exist functionality is implemented. The connector stores all data in a single table, so the way it determines whether a collection exists or not is if there is a row there with a matching collection column value. On collection creation a null record with the collection name is inserted if there are no records with the required collection name yet.

This null record gets removed whenever GetNearestMatchesAsync or GetNearestMatchAsync is called.

There are a number of issues with this:

  1. If the developer executes the following sequence for a new collection: CreateCollectionAsync --> GetNearestMatchAsync --> DoesCollectionExistAsync, then the result for DoesCollectionExistAsync will be false, which is inaccurate.
  2. If all records are deleted in a collection, the result for DoesCollectionExistAsync will be false, since no null record is inserted when the last record is deleted.

We have a few options here:

  1. Remove the null record code, do nothing for CreateCollectionAsync and always return true for DoesCollectionExistAsync. DeleteCollectionAsync would still delete all records in the collection. This would be confusing if the developer calls delete and then does exist and it returns true.
  2. Expand the null record code to accurately remove the null record on the first insert and add it back in again after the last delete in a collection. This would add cost in that each delete would need to check if it's the last record and each insert would need to also check for the null record.
  3. Use a separate table for managing collections.
  4. Keep the null record for the entire lifetime of the collection.

3 would be a breaking change for existing users. 1 would be a behavioral breaking change for existing users. 2 or 4 is therefore least disruptive.

CC @markwallace-microsoft

atiq-bs23 commented 3 months ago

@westey-m GetAllAsync is executing a delete query all the time for the NULL row, it looks odd. Although its a private method and it is calling only from GetNearestMatchesAsyncand GetNearestMatchAsync.

westey-m commented 3 months ago

@atiq-bs23, is the null record causing you problems, or are you just curious about why it's there? It's essentially a placeholder to indicate that a collection exists or not, but the fact that the null record is removed during searches is clearly a bug. The simplest solution is to remove that code, and only remove the null record when the collection is deleted. This would of course mean that the null record is even longer lived, but this would allow a more accurate implementation of DoesCollectionExistAsync.

atiq-bs23 commented 3 months ago

@westey-m it's not causing any issue in my end

github-actions[bot] commented 2 weeks ago

This issue is stale because it has been open for 90 days with no activity.