microsoft / semantic-kernel

Integrate cutting-edge LLM technology quickly and easily into your apps
https://aka.ms/semantic-kernel
MIT License
22.07k stars 3.29k forks source link

.Net: Using the VectorStoreGenericDataModel when the Key data type is unknown at compile time #9701

Open f2bo opened 1 week ago

f2bo commented 1 week ago

Assume a scenario where the vector store record definitions are loaded from a a configuration file. For example:

{
    "collections": [
        "articles":  {
            "Key": "string",
            "Name": "string",
            "Title": "string",     
            "Body": "string",
            "BodyEmbedding": "float[384]"
        },
        "glossary": 
        {
            "Key": "int",
            "Term": "string",
            "Definition": "string",
            "DefinitionEmbedding": "float[1536]"
        }
   ]
}

This file is read at runtime to create a VectorStoreRecordDefinition for a given collection. Notice that the Key property has different data types for each collection, string and int respectively.

string collectionName = "articles";

// Load the definitions from configuration and define a schema for the specified collection
VectorStoreRecordDefinition vectorStoreRecordDefinition = LoadVectorDefinitionForCollection(collectionName);

Once a record definition has been created, it's time to operate on the corresponding collection. The generic data model is meant to be used in scenarios where the database schema is unknown at compile time. However, it's key data type needs to be known at compile time.

// get a reference to the collection
var collection = vectorStore.GetCollection<string, VectorStoreGenericDataModel<??????>>(collectionName, vectorStoreRecordDefinition);

How do you use it when the key data type needs to be specified at runtime. Is there a pattern that you recommend in such a scenario?

westey-m commented 1 week ago

Thanks for the scenario @f2bo, we haven't really considered it before. A possible solution would be to support object as a key type with the VectorStoreGenericDataModel and casting back and forth to the key type defined in the VectorStoreRecordDefinition. E.g.

var collection = vectorStore.GetCollection<object, VectorStoreGenericDataModel<object>>(collectionName, vectorStoreRecordDefinition);

Note that if the VectorStoreRecordDefinition says that the key type is int, an int would have to be supplied as the key for methods such as GetAsync even though the signature would accept an object. E.g.

object key = 5;
await collection.GetAsync(key);

We would need to do some prototyping on this, and add support in each vector store implementation, but let us know if this would work for your use case.

f2bo commented 1 week ago

let us know if this would work for your use case.

I hadn't used the generic data model before and only just started experimenting with an idea when I noticed this problem, so it's too early to tell. It does feel less robust but I imagine that it would work.

I'll report back if I find that using object is impractical.

Thank you!

westey-m commented 1 week ago

It does feel less robust

@f2bo, did you also have another solution in mind or do you just mean object is less robust than using the strongly typed key types?

f2bo commented 1 week ago

Sorry. Don't attach too much weight to my comment. I did mean it felt less robust than using a strongly typed key type but I suppose that given that the data type is determined at runtime, you probably can't do better than this. As I said, I don't yet have a complete picture of where I'm going with this.