npgsql / efcore.pg

Entity Framework Core provider for PostgreSQL
PostgreSQL License
1.54k stars 226 forks source link

Support jsonb for HasGeneratedTsVectorColumn #2215

Closed toklive closed 2 years ago

toklive commented 2 years ago

Are there any examples or references of how to use stored tsvector like this documentation here for array types and jsonb types ?: https://www.npgsql.org/efcore/mapping/full-text-search.html?tabs=pg12%2Cv5#method-1-tsvector-column

roji commented 2 years ago

@toklive I'm not sure how tsvector would be compatible with arrays specifically... In any case, I'd recommend looking at general PostgreSQL documentation - if you can define a generated tsvector column over array/jsonb, you should be able to follow the Npgsql instructions as they are.

Let me know if you run into any blockers which are specifically related to Npgsql (as opposed to PostgreSQL).

toklive commented 2 years ago

Thanks for your prompt response, @roji . Yes, I am having an issue with the code-first approach here...

I have an entity with a property of type text[] and I am trying to create a stored tsvector for the same for full-text search.

nuget versions are

<PackageReference Include="Npgsql.EntityFrameworkCore.PostgreSQL" Version="6.0.0" />
<PackageReference Include="Npgsql.EntityFrameworkCore.PostgreSQL.NetTopologySuite" Version="6.0.0" />
 public class UserBusinessInfo : BaseEntity
    { 
        public Guid UserId { get; set; }

        public Guid BusinessInfoId { get; set; }

        public User User { get; set; }

        public BusinessInfo BusinessInfo { get; set; }

        public List<string> Tags { get; set; }

        public NpgsqlTsVector SearchVector { get; set; }

    }

I am creating an index by referring the documentation I have referred like below,

  modelBuilder.Entity<UserBusinessInfo>()
          .HasGeneratedTsVectorColumn(
              b => b.SearchVector,
              "english",  // Text search config
              b => new { b.Tags })  // Included properties
              .HasIndex(b => b.SearchVector)
              .HasMethod("GIN"); 

The above is creating the below underline query, but as per the postgres documetnation it should create array_to_tsvector as the type is an array and the same is needed to be called jsonb_to_tsvector if the column type is jsonb. https://pgpedia.info/a/array_to_tsvector.html

Ref for both: https://pgpedia.info/f/full-text-search.html

Here is the underlying SQL query and error I am getting below,

CREATE TABLE user_business_info (
    id uuid NOT NULL,
    user_id uuid NOT NULL,
    business_info_id uuid NOT NULL,
    tags text[] NULL,
    search_vector tsvector GENERATED ALWAYS AS (to_tsvector('english', coalesce(tags, ''))) STORED,
    created_at timestamp with time zone NOT NULL,
    modified_at timestamp with time zone NOT NULL,
    is_deleted boolean NOT NULL,
    CONSTRAINT pk_user_business_info PRIMARY KEY (id),
    CONSTRAINT fk_user_business_info_business_info_business_info_id FOREIGN KEY (business_info_id) REFERENCES business_info (id) ON DELETE CASCADE,
    CONSTRAINT fk_user_business_info_users_user_id FOREIGN KEY (user_id) REFERENCES users (id) ON DELETE CASCADE

This is throwing the below error when I tried to update the migration,

POSITION: 233 DETAIL: Detail redacted as it may contain sensitive data. Specify 'Include Error Detail' in the connection string to include this information. at Npgsql.Internal.NpgsqlConnector.<ReadMessage>g__ReadMessageLong|213_0(NpgsqlConnector connector, Boolean async, DataRowLoadingMode dataRowLoadingMode, Boolean readingNotifications, Boolean isReadingPrependedMessage) at Npgsql.NpgsqlDataReader.NextResult(Boolean async, Boolean isConsuming, CancellationToken cancellationToken) at Npgsql.NpgsqlDataReader.NextResult() at Npgsql.NpgsqlCommand.ExecuteReader(CommandBehavior behavior, Boolean async, CancellationToken cancellationToken) at Npgsql.NpgsqlCommand.ExecuteReader(CommandBehavior behavior, Boolean async, CancellationToken cancellationToken) at Npgsql.NpgsqlCommand.ExecuteNonQuery(Boolean async, CancellationToken cancellationToken) at Npgsql.NpgsqlCommand.ExecuteNonQuery() at Microsoft.EntityFrameworkCore.Storage.RelationalCommand.ExecuteNonQuery(RelationalCommandParameterObject parameterObject) at Microsoft.EntityFrameworkCore.Migrations.MigrationCommand.ExecuteNonQuery(IRelationalConnection connection, IReadOnlyDictionary2 parameterValues) at Microsoft.EntityFrameworkCore.Migrations.Internal.MigrationCommandExecutor.ExecuteNonQuery(IEnumerable1 migrationCommands, IRelationalConnection connection) at Microsoft.EntityFrameworkCore.Migrations.Internal.Migrator.Migrate(String targetMigration) at Microsoft.EntityFrameworkCore.Design.Internal.MigrationsOperations.UpdateDatabase(String targetMigration, String connectionString, String contextType) at Microsoft.EntityFrameworkCore.Design.OperationExecutor.UpdateDatabaseImpl(String targetMigration, String connectionString, String contextType) at Microsoft.EntityFrameworkCore.Design.OperationExecutor.UpdateDatabase.<>c__DisplayClass0_0.<.ctor>b__0() at Microsoft.EntityFrameworkCore.Design.OperationExecutor.OperationBase.Execute(Action action) Exception data: Severity: ERROR SqlState: 22P02 MessageText: malformed array literal: "" Detail: Detail redacted as it may contain sensitive data. Specify 'Include Error Detail' in the connection string to include this information. Position: 233 File: arrayfuncs.c Line: 317 Routine: array_in 22P02: malformed array literal: "".

toklive commented 2 years ago

If I can find a SQL query for the same would be great as I will use the raw SQL with the migration, but I couldn't find any documentation for the same. Any idea about it?

For varchar search_vector tsvector GENERATED ALWAYS AS (to_tsvector('english', coalesce(tags, ''))) STORED,

For text[] search_vector tsvector GENERATED ALWAYS AS (array_to_tsvector(<what should go here ???>)) STORED,

For jsonb search_vector tsvector GENERATED ALWAYS AS (jsonb_to_tsvector(<what should go here ???>)) STORED,

roji commented 2 years ago

Thanks for the additional detail. HasGeneratedTsVectorColumn is very simple sugar which creates a generated column with a very specific and simple SQL expression - it definitely does not recognize arrays or generate array_to_tsvector or jsonb_to_tsvector. I'll take a look at adding this.

However, you can simply set up the generated column yourself, providing whatever SQL you want - see the EF docs for more information (again, HasGeneratedTsVectorColumn basically does this exact thing under the hood).

Finally, the stack trace you posted above doesn't seem related to any of this - that looks like a report about a malformed connection string. Check exactly what you're using and make sure it's a valid connection string, if you can't figure it out post it here and I'll try to help.

roji commented 2 years ago

If I can find a SQL query for the same would be great as I will use the raw SQL with the migration,

No need for raw SQL in migrations - just use HasComputedColumnSql as per the EF docs.

As for the actual SQL to use - I don't have any specific experience in full-text search over arrays/json... Try getting the SQL to work based on the PostgreSQL docs, and if you do please post back here.

toklive commented 2 years ago

Thanks for the additional detail. HasGeneratedTsVectorColumn is very simple sugar which creates a generated column with a very specific and simple SQL expression - it definitely does not recognize arrays or generate array_to_tsvector or jsonb_to_tsvector. I'll take a look at adding this.

However, you can simply set up the generated column yourself, providing whatever SQL you want - see the EF docs for more information (again, HasGeneratedTsVectorColumn basically does this exact thing under the hood).

Finally, the stack trace you posted above doesn't seem related to any of this - that looks like a report about a malformed connection string. Check exactly what you're using and make sure it's a valid connection string, if you can't figure it out post it here and I'll try to help.

Thank you so much @roji. Sorry, I have pasted the actual error message in my comment now.

roji commented 2 years ago

That exception still seems partial - the actual message isn't in there.

toklive commented 2 years ago

Thank you. I couldn't find any sample in the official documentation connected with full-text and stored. I will ask in postgres community and see if I can get any help. thanks

toklive commented 2 years ago

malformed array literal: "".

Hmm ok, in visual studio I am getting this line as in red "malformed array literal: ""."

toklive commented 2 years ago

@roji I was able to make it work as per your suggestion using computed column using code first approach. Thank you so much. It will be great if we can get the NpgsqlTsVector properties can create the corresponding computeColumns based on the type of the column. e.g text[] , jsonb etc in future


            modelBuilder.Entity<UserBusinessInfo>()
            .Property(b => b.SearchVector)
            .HasComputedColumnSql(@"jsonb_to_tsvector('english', tags, '[""string""]')", stored: true);

Above created like below,

migrationBuilder.CreateTable(
name: "user_business_info",
columns: table => new
{
id = table.Column<Guid>(type: "uuid", nullable: false),
tags = table.Column<List<Tags>>(type: "jsonb", nullable: true),search_vector = table.Column<NpgsqlTsVector>(type: "tsvector", nullable: true, computedColumnSql: "jsonb_to_tsvector('english', tags, '[\"string\"]')", stored: true),

},