pgvector / pgvector-dotnet

pgvector support for .NET (C#, F#, and Visual Basic)
MIT License
130 stars 14 forks source link

Issues with: DbContextPool & EF Migration #31

Closed jerome-zenfolio closed 9 months ago

jerome-zenfolio commented 10 months ago

Hello :wave:

I have a .NET 8 project and I was able to duplicate the code and run this test without any issues.

However, when I attempt to use DbContextPool, I cannot get this to work. Sorry, this is not a reproducible code but a copy-paste from a larger codebase.

This is my DbContext:

public class CatalogDbContext : DbContextBase
  {
      public CatalogDbContext(DbContextOptions<CatalogDbContext> options)
          : base(options)
      {
      }

      public virtual DbSet<ProductEmbedding> ProductEmbeddings { get; set; }

      protected override void OnModelCreating(ModelBuilder modelBuilder)
      {
          modelBuilder.HasPostgresExtension("vector");

          modelBuilder.Entity<ProductEmbedding>()
              .HasIndex(i => i.Embedding)
              .HasMethod("hnsw")
              .HasOperators("vector_l2_ops")
              .HasStorageParameter("m", 16)
              .HasStorageParameter("ef_construction", 64);
      }
  }

    public class ProductEmbedding
    {
        public int Id { get; set; }

        [Column(TypeName = "vector(3)")]
        public Vector Embedding { get; set; }
    }

Dependency injection:

serviceCollection.AddDbContextPool<CatalogDbContext>(options =>
{
    NpgsqlDataSource dataSource = ...
    options.UseNpgsql(dataSource, pgOptions => pgOptions.UseVector());
});

When I run the EF Migration, I am getting the error:

 Unable to create a 'DbContext' of type ''. The exception 'An exception was thrown while activating ?:EcomService.Database.CatalogDbContext -> Microsoft.EntityFrameworkCore.Internal.ScopedDbContextLease`1[[EcomService.Database.CatalogDbContext, EcomService.Database, Version=1.0.0.0, Culture=neutral, PublicKeyToken=null]].' was thrown while attempting to create an instance. For the different patterns supported at design time, see https://go.microsoft.com/fwlink/?linkid=851728
 ---> Autofac.Core.DependencyResolutionException: An exception was thrown while activating ?:EcomService.Database.CatalogDbContext -> Microsoft.EntityFrameworkCore.Internal.ScopedDbContextLease`1[[EcomService.Database.CatalogDbContext, EcomService.Database, Version=1.0.0.0, Culture=neutral, PublicKeyToken=null]].
 ---> Autofac.Core.DependencyResolutionException: An exception was thrown while invoking the constructor 'Void .ctor(Microsoft.EntityFrameworkCore.Internal.IDbContextPool`1[EcomService.Database.CatalogDbContext])' on type 'ScopedDbContextLease`1'.
 ---> System.InvalidOperationException: No suitable constructor was found for entity type 'Vector'. The following constructors had parameters that could not be bound to properties of the entity type:
    Cannot bind 'v' in 'Vector(ReadOnlyMemory<float> v)'
    Cannot bind 's' in 'Vector(string s)'
Note that only mapped properties can be bound to constructor parameters. Navigations to related entities, including references to owned types, cannot be bound. 

If I create the table manually and attempt to insert:

                _context.ProductEmbeddings.Add(new() { Id = 1, Embedding = new Vector(new float[] { 1, 1, 1 }) });
                _context.ProductEmbeddings.Add(new() { Id = 2, Embedding = new Vector(new float[] { 1, 2, 1 }) });
                _context.ProductEmbeddings.Add(new() { Id = 3, Embedding = new Vector(new float[] { 2, 1, 1 }) });
                await _context.SaveChangesAsync();

I am getting the following error:

Microsoft.EntityFrameworkCore.DbUpdateException: An error occurred while saving the entity changes. See the inner exception for details.
 ---> System.InvalidCastException: Writing values of 'Pgvector.Vector' is not supported for parameters having no NpgsqlDbType or DataTypeName. Try setting one of these values to the expected database type..
   at Npgsql.Internal.AdoSerializerHelpers.<GetTypeInfoForWriting>g__ThrowWritingNotSupported|1_0(Type type, PgSerializerOptions options, Nullable`1 pgTypeId, Nullable`1 npgsqlDbType, Exception inner)
   at Npgsql.Internal.AdoSerializerHelpers.GetTypeInfoForWriting(Type type, Nullable`1 pgTypeId, PgSerializerOptions options, Nullable`1 npgsqlDbType)
   at Npgsql.NpgsqlParameter.ResolveTypeInfo(PgSerializerOptions options)
   at Npgsql.NpgsqlParameterCollection.ProcessParameters(PgSerializerOptions options, Boolean validateValues, CommandType commandType)
   at Npgsql.NpgsqlCommand.ExecuteReader(Boolean async, CommandBehavior behavior, CancellationToken cancellationToken)
   at Npgsql.NpgsqlCommand.ExecuteReader(Boolean async, CommandBehavior behavior, CancellationToken cancellationToken)
   at Npgsql.NpgsqlCommand.ExecuteDbDataReaderAsync(CommandBehavior behavior, CancellationToken cancellationToken)

If I manually insert rows and attempt to read:

ProductEmbedding x = await _context.ProductEmbeddings.FirstOrDefaultAsync();

I am getting:

System.InvalidCastException: Reading as 'System.Object' is not supported for fields having DataTypeName 'public.vector'
   at Npgsql.Internal.AdoSerializerHelpers.<GetTypeInfoForReading>g__ThrowReadingNotSupported|0_0(Type type, String displayName, Exception inner)
   at Npgsql.Internal.AdoSerializerHelpers.GetTypeInfoForReading(Type type, PostgresType postgresType, PgSerializerOptions options)
   at Npgsql.BackendMessages.FieldDescription.<GetInfo>g__GetInfoSlow|50_0(Type type, ColumnInfo& lastColumnInfo)
   at Npgsql.BackendMessages.FieldDescription.GetInfo(Type type, ColumnInfo& lastColumnInfo)
   at Npgsql.BackendMessages.FieldDescription.get_ObjectOrDefaultInfo()
   at Npgsql.BackendMessages.FieldDescription.get_FieldType()
   at Npgsql.NpgsqlDataReader.GetFieldType(Int32 ordinal)

Could you please point me what I could be doing wrong. Thank you!

Jerome

roji commented 10 months ago

@jerome-zenfolio it would be really helpful if you could put together a simple, runnable repro.

jerome-zenfolio commented 10 months ago

@roji, I wrote a simpler code, but I cannot reproduce the issue. In order to reproduce, I had to comment-out UseVector initialization. So, there must be some bug with the way I am initializing it.

using System;

using Microsoft.EntityFrameworkCore;
using Microsoft.EntityFrameworkCore.Infrastructure;
using Microsoft.EntityFrameworkCore.Storage;
using Microsoft.Extensions.DependencyInjection;

using Pgvector;

namespace VectorDemo
{
    class Program
    {
        static void Main()
        {
            string connectionString = "Host=localhost;Database=dev_zenfolio_product_catalog;Port=5432;User Id=postgres;Password=postgres;";

            var serviceProvider = new ServiceCollection()
                .AddDbContextPool<MyDbContext>(options =>
                    options.UseNpgsql(connectionString, builder => builder.UseVector()))
                .BuildServiceProvider();

            using var scope = serviceProvider.CreateScope();
            var dbContext = scope.ServiceProvider.GetRequiredService<MyDbContext>();

            var databaseCreator = dbContext.Database.GetService<IDatabaseCreator>() as IRelationalDatabaseCreator;
            databaseCreator?.CreateTables();

            var newRecord = new MyEntity { Name = "Sample", Embedding = new Vector(new[] { 1f, 1, 1 }) };
            var dbContext2 = scope.ServiceProvider.GetRequiredService<MyDbContext>();
            dbContext2.MyEntities.Add(newRecord);
            dbContext2.SaveChanges();

            foreach (var entity in dbContext.MyEntities)
            {
                Console.WriteLine($"ID: {entity.Id}, Name: {entity.Name}");
            }
        }
    }

    public class MyEntity
    {
        public int Id { get; set; }
        public string? Name { get; set; }
        public Vector? Embedding { get; set; }
    }

    public class MyDbContext : DbContext
    {
        public MyDbContext(DbContextOptions<MyDbContext> options) : base(options)
        {
        }

        public DbSet<MyEntity> MyEntities { get; set; }
    }
}

I'll close out issue, if I can't reproduce. Thanks!

jerome-zenfolio commented 10 months ago

Okay there is some progress; I've found an issue in my original code. I had to update the DI routine like this:

serviceCollection.AddDbContextPool<CatalogDbContext>(options =>
{
    NpgsqlDataSource dataSource = ...
    options.UseVector();
    options.UseNpgsql(dataSource, pgOptions => pgOptions.UseVector());
});

It looks the call UseVector is required on both NpgSqlDataSourceBuilder.UseVector and NpgSqlDbContextOptionsBuilder. I didn't invoke the former. Now that fixed the issue with inserting new row or reading vectors from an existing table. Now on to hunt why it fails to create a table with a vector column when I run ef migration from command line.

jerome-zenfolio commented 9 months ago

Closing this for now, if I can create a reproducible sample, will create a new one.