microsoft / semantic-kernel

Integrate cutting-edge LLM technology quickly and easily into your apps
https://aka.ms/semantic-kernel
MIT License
22.15k stars 3.31k forks source link

.Net: Bug: VectorStoreOperationException using the SqliteVectorStore #9380

Closed f2bo closed 1 month ago

f2bo commented 1 month ago

Describe the bug I'm having trouble using the new SqliteVectorStore included in the latest release.

I first attempted using the AddSqliteVectorStore(connectionString, options, serviceId) overload, but it creates a new SqliteConnection that it never opens and any subsequent operations on the vector store fail. I don't see any obvious way to open it after the fact.

https://github.com/microsoft/semantic-kernel/blob/01c288a4be3c5a2f9c99454aee1035294bc6cb17/dotnet/src/Connectors/Connectors.Memory.Sqlite/SqliteServiceCollectionExtensions.cs#L49-L69

I then registered an already open SqliteConnection that it could use and added the vector store using AddSqliteVectorStore(options, serviceId) overload instead. However, any operation on the vector store fails with a VectorStoreOperationException - SqliteException: SQLite Error 1: 'no such module: vec0'. It seems that it's using a SQLite module that is not included in the Microsoft.Data.Sqlite package.

Microsoft.Extensions.VectorData.VectorStoreOperationException
  HResult=0x80131500
  Message=Call to vector store failed.
  Source=Microsoft.SemanticKernel.Connectors.Sqlite
  StackTrace:
   at Microsoft.SemanticKernel.Connectors.Sqlite.SqliteVectorStoreRecordCollection`1.<RunOperationAsync>d__50`1.MoveNext()
   at ....

  This exception was originally thrown at this call stack:
    Microsoft.Data.Sqlite.SqliteException.ThrowExceptionForRC(int, SQLitePCL.sqlite3)
    Microsoft.Data.Sqlite.SqliteDataReader.NextResult()
    Microsoft.Data.Sqlite.SqliteCommand.ExecuteReader(System.Data.CommandBehavior)
    Microsoft.Data.Sqlite.SqliteCommand.ExecuteReader()
    Microsoft.Data.Sqlite.SqliteCommand.ExecuteNonQuery()
    System.Data.Common.DbCommand.ExecuteNonQueryAsync(System.Threading.CancellationToken)
    Microsoft.SemanticKernel.Connectors.Sqlite.SqliteVectorStoreRecordCollection<TRecord>.RunOperationAsync<T>(string, System.Func<System.Threading.Tasks.Task<T>>)

Inner Exception 1:
SqliteException: SQLite Error 1: 'no such module: vec0'.

To Reproduce Steps to reproduce the behavior:

HostApplicationBuilder builder = Host.CreateApplicationBuilder();
builder.Services.AddTransient(sp =>
{
    var conn = new SqliteConnection($"Data Source=test.db");
    conn.Open();
    return conn;
})
.AddSqliteVectorStore();

var host = builder.Build();

var store = host.Services.GetRequiredService<IVectorStore>();
var collection = store.GetCollection<string, TextChunk>("xxxx");
await collection.CreateCollectionIfNotExistsAsync();

internal class TextChunk
{
    [VectorStoreRecordKey]
    public required string Key { get; set; }

    [VectorStoreRecordData(IsFullTextSearchable = true)]
    public string? Text { get; set; }

    [VectorStoreRecordVector(Dimensions: 384, DistanceFunction.CosineDistance, IndexKind.Hnsw)]
    public ReadOnlyMemory<float>? Embedding { get; set; }
}

Expected behavior No exceptions.

Platform

dmytrostruk commented 1 month ago

@f2bo Thanks for reporting this issue! At the moment, I'm working on documentation for Microsoft Learn portal with more information about SQLite connector and its setup. Meanwhile, I will share this information here:

SQLite doesn't support vector search natively, that's why SQLite extension should be loaded first. As for now, we are using the following vector search extension: https://github.com/asg017/sqlite-vec. In the future, if SQLite will support vector search natively, we will transition to that, otherwise we will continue to use this extension and maybe we could add support for other vector search extensions if there will be such requests.

In order to add it to your application, you can use following link and download install.sh file: https://github.com/asg017/sqlite-vec/releases/tag/v0.1.3. If you run this script, it will compile the extension and produce vec0.dll file. You need to take this file and put it in the same folder where your project that's using this connector is located (e.g. MyProject\bin\Debug\net8.0). After that, the extension should be loaded successfully and the functionality for vector search should work.

Thanks again!

f2bo commented 1 month ago

@dmytrostruk Thank you for your reply.

Yes. I did look at the code and saw that it was using this extension. I understand that it will be necessary until SQLite supports vector search natively, but will the manual installation be required until then or is this just a temporary workaround? It's really not the best experience. Wouldn't it be possible to include the binary in the SQLite connector's package? Although, I suppose that it would mean having variants for each supported platform.

Other than the missing extension, there seems to be a second problem in that the connector never opens the SQLite connection. This can be seen with the following code that fails with InvalidOperationException: ExecuteNonQuery can only be called when the connection is open.

HostApplicationBuilder builder = Host.CreateApplicationBuilder();
builder.Services.AddSqliteVectorStore($"Data Source=test.db");
var host = builder.Build();

var store = host.Services.GetRequiredService<IVectorStore>();
var collection = store.GetCollection<string, TextChunk>("xxxx");
await collection.CreateCollectionIfNotExistsAsync();

Lastly, I noticed that the AddSqliteVectorStore overload that obtains its connection from the DI container does not load the sqlite-vec extension. Is that intentional and it is assumed that it will be already configured properly?

Thanks!

dmytrostruk commented 1 month ago

It's really not the best experience. Wouldn't it be possible to include the binary in the SQLite connector's package? Although, I suppose that it would mean having variants for each supported platform.

@f2bo Yes, the installation process is not the best one, but we will need to think how to support it better in terms of installation, since the vector extension is written in C. We will also need to handle versioning and so on. As for now, it's recommended to install it on your side in a way that works best for you.

there seems to be a second problem in that the connector never opens the SQLite connection

Thank you for reporting this, I've created a PR with a fix: https://github.com/microsoft/semantic-kernel/pull/9407

Lastly, I noticed that the AddSqliteVectorStore overload that obtains its connection from the DI container does not load the sqlite-vec extension. Is that intentional and it is assumed that it will be already configured properly?

Correct, this approach gives you more flexibility when exactly in your application runtime you want to open/close connection, load an extension etc. So, you manage it on your side, and we will just use it. Otherwise, if it's not provided, we will initialize an instance of SqliteConnection on our side, open a connection and load an extension for you. In PR I shared above, I also updated XML documentation with more details about this.

Thank you!

f2bo commented 1 month ago

@dmytrostruk Thanks!

I was wondering if the requirement that the connection be already open could be removed and instead have the connector check its state and open it when necessary. Not a big deal but just one less thing that can go wrong when setting up the app.

dmytrostruk commented 1 month ago

I was wondering if the requirement that the connection be already open could be removed and instead have the connector check its state and open it when necessary. Not a big deal but just one less thing that can go wrong when setting up the app.

@f2bo In this case you could use a method overload that accepts a connection string, so the connection initialization and opening should be managed for you. I think we want to keep methods that pull SqliteConnection from DI container and just use it as is without any modifications on connection instance. This will allow users to configure their SqliteConnection in the way they want but still use DI to register VectorStore/Collection classes.

f2bo commented 1 month ago

@dmytrostruk My point was that if the connection retrieved from the DI container is closed, and you use it as is, this is guaranteed to fail, which is clearly not what was intended. It seems unnecessary given that the connector can look at the state of the connection and if it's closed, open it. I don't think that would break any expectations. Anyway, just wanted to make a suggestion. Feel free to close when the PR is ready.

dmytrostruk commented 1 month ago

My point was that if the connection retrieved from the DI container is closed, and you use it as is, this is guaranteed to fail, which is clearly not what was intended. It seems unnecessary given that the connector can look at the state of the connection and if it's closed, open it. I don't think that would break any expectations.

@f2bo I've added necessary logic to address this. Thanks for your feedback!

f2bo commented 1 month ago

@dmytrostruk Good to hear! Thank you again.