yugabyte / yugabyte-db

YugabyteDB - the cloud native distributed SQL database for mission-critical applications.
https://www.yugabyte.com
Other
8.69k stars 1.04k forks source link

[YSQL] INSERT statement with identity key causing catalog read request to master for pg_depends #22970

Open markpyb opened 1 month ago

markpyb commented 1 month ago

Jira Link: DB-11887

Description

(Workarounds at the bottom)

I am observing inserts with on an identity key to cause catalog read requests to lookup pg_depends using the below cluster info this cluster is not a fresh build, so its not clinical so there are cluster operations and mutations before this config, but this is the current config where the issue is present

Cluster Info:

YBA 2024.1
YBDB 2024.1
O/S: YBA-Managed-AlmaLinux-8.9
c5.2xlarge (8 cores, 16GB RAM)
1x 250gb GP3 disk 

gflags

ysql_enable_read_request_caching    true    true
tablet_split_low_phase_shard_count_per_node     1
tablet_split_high_phase_shard_count_per_node        1
enable_wait_queues  true    true
ysql_sequence_cache_method  server  
ysql_num_shards_per_tserver 1   
yb_enable_read_committed_isolation  true    

DDL

CREATE TABLE public.table (
    pkey int8 GENERATED ALWAYS AS IDENTITY( INCREMENT BY 1 MINVALUE 1 MAXVALUE 9223372036854775807 START 1 CACHE 100 NO CYCLE) NOT NULL,
    one int4,
    two int4, 
    three int4 ,
    date date, 
    CONSTRAINT pkey PRIMARY KEY (pkey)
) SPLIT INTO 3 TABLETS;
CREATE INDEX ix_table ON public.table USING lsm ((one) HASH, two, three, date DESC) SPLIT INTO 3 TABLETS;

DML

explain (ANALYZE, DIST)
    INSERT INTO public.table (
        one, two, three, date
    ) VALUES (
        1938, 58, 678,'2024-06-21 12:32:43'
    );
explain (ANALYZE, DIST)
    INSERT INTO public.table (
        one, two, three, date
    ) VALUES (
        1938, 58, 678,'2024-06-21 12:32:43'
    );
                                           QUERY PLAN                                           
------------------------------------------------------------------------------------------------
 Insert on "table"  (cost=0.00..0.01 rows=1 width=24) (actual time=0.157..0.157 rows=0 loops=1)
   ->  Result  (cost=0.00..0.01 rows=1 width=24) (actual time=0.118..0.118 rows=1 loops=1)
         Storage Table Write Requests: 1
         Storage Index Write Requests: 1
 Planning Time: 0.028 ms
 Execution Time: 2.852 ms
 Storage Read Requests: 0
 Storage Rows Scanned: 0
 Storage Write Requests: 2
 Catalog Read Requests: 1
 Catalog Read Execution Time: 1.492 ms
 Catalog Write Requests: 0
 Storage Flush Requests: 1
 Storage Flush Execution Time: 2.627 ms
 Storage Execution Time: 4.119 ms
 Peak Memory Usage: 24 kB

docdb tracing showed lookup to be

00004d00000030008000000000000a30 -> pg_depends
00004d00000030008000000000000a72 -> pg_depend_reference_index

which is the table being inserted into and the identity key sequence of the table being inserted into

upon further investigation, easy workarounds are identified

1) this behaviour is only present in un-prepared statements, you can use prepare and execute SQL

PREPARE insert_statement (int, int, int, timestamp) AS
INSERT INTO public.table (one, two, three, date)
VALUES ($1, $2, $3, $4);
EXECUTE insert_statement(8, 9, 10, '2016-06-21 12:32:43');

or you can use driver to automatically prepare statements after X executions, once prepared there is no master lookup, here is a c# driver example (npgsql) Max Auto Prepare=100;Auto Prepare Min Usages=5

2) this behaviour is not present when using a sequence as a default value, this issue can be worked around like by using this DDL instead if not using prepared statements

CREATE SEQUENCE public.table2_pkey_seq
    INCREMENT BY 1
    MINVALUE 1
    MAXVALUE 9223372036854775807
    START 1
    CACHE 100
    NO CYCLE;

CREATE TABLE public.table2 (
    pkey2 int8 NOT NULL DEFAULT nextval('public.table2_pkey_seq'),
    one int4,
    two int4, 
    three int4,
    date date, 
    CONSTRAINT pkey2 PRIMARY KEY (pkey2)
) SPLIT INTO 3 TABLETS;

explain (ANALYZE, DIST)
    INSERT INTO public.table2 (
        one, two, three, date
    ) VALUES (
        8, 9, 10,'2016-06-21 12:32:43'
    );
                                          QUERY PLAN                                           
-----------------------------------------------------------------------------------------------
 Insert on table2  (cost=0.00..0.01 rows=1 width=24) (actual time=3.490..3.490 rows=0 loops=1)
   ->  Result  (cost=0.00..0.01 rows=1 width=24) (actual time=0.168..0.169 rows=1 loops=1)
         Storage Table Write Requests: 1
 Planning Time: 0.027 ms
 Execution Time: 3.527 ms
 Storage Read Requests: 0
 Storage Rows Scanned: 0
 Storage Write Requests: 1
 Catalog Read Requests: 0
 Catalog Write Requests: 0
 Storage Flush Requests: 0
 Storage Execution Time: 0.000 ms
 Peak Memory Usage: 24 kB
(13 rows)

Issue Type

kind/perf

Warning: Please confirm that this issue does not contain any sensitive information

myang2021 commented 1 month ago

This is the case where PG code that YB inherits from does a catalog table scan:

#19 0x000055a0f4deea01 in ybc_systable_getnext (default_scan=0x5737f051310) at ../../../../../../../src/postgres/src/backend/access/yb_access/yb_scan.c:3152
#20 0x000055a0f4d435d3 in systable_getnext (sysscan=0x5737f0517b0) at ../../../../../../../src/postgres/src/backend/access/index/genam.c:452
#21 0x000055a0f4e5316f in getOwnedSequences (relid=16386, attnum=1) at ../../../../../../src/postgres/src/backend/catalog/pg_depend.c:773
#22 0x000055a0f4e5328e in getOwnedSequence (relid=16386, attnum=1) at ../../../../../../src/postgres/src/backend/catalog/pg_depend.c:805
#23 0x000055a0f52592b6 in build_column_default (rel=0x5737e18dee0, attrno=1) at ../../../../../../src/postgres/src/backend/rewrite/rewriteHandler.c:1144
#24 0x000055a0f525cb3d in rewriteTargetListIU (targetList=0x5737f8ffc08, commandType=CMD_INSERT, override=OVERRIDING_NOT_SET, target_relation=0x5737e18dee0, result_rti=1, attrno_list=0x0) at ../../../../../../src/postgres/src/backend/rewrite/rewriteHandler.c:871

Notice build_column_default is required because of the identity defined on the table pkey int8 GENERATED ALWAYS AS IDENTITY( INCREMENT BY 1 MINVALUE 1 MAXVALUE 9223372036854775807 START 1 CACHE 100 NO CYCLE) NOT NULL, In this case PG does not use the catalog cache:

/*
 * Collect a list of OIDs of all sequences owned by the specified relation,
 * and column if specified.
 */
List *
getOwnedSequences(Oid relid, AttrNumber attnum)
{
    List       *result = NIL;
    Relation    depRel;
    ScanKeyData key[3];
    SysScanDesc scan;
    HeapTuple   tup;

    depRel = heap_open(DependRelationId, AccessShareLock);

    ScanKeyInit(&key[0],
                Anum_pg_depend_refclassid,
                BTEqualStrategyNumber, F_OIDEQ,
                ObjectIdGetDatum(RelationRelationId));
    ScanKeyInit(&key[1],
                Anum_pg_depend_refobjid,
                BTEqualStrategyNumber, F_OIDEQ,
                ObjectIdGetDatum(relid));
    if (attnum)
        ScanKeyInit(&key[2],
                    Anum_pg_depend_refobjsubid,
                    BTEqualStrategyNumber, F_INT4EQ,
                    Int32GetDatum(attnum));

    scan = systable_beginscan(depRel, DependReferenceIndexId, true,

Notice it calls systable_beginscan which does a full catalog table scan in order to find out all sequences owned by the specified relation. It cannot find all sequences from the catalog cache.

kai-franz commented 2 weeks ago

Using a SEQUENCE/SERIAL instead of GENERATED ALWAYS AS IDENTITY will prevent the extra cache lookups. This is because SEQUENCE/SERIAL default values are stored in pg_attrdef, which is cached in the relcache, while IDENTITY columns need to be looked up with getOwnedSequence.

Using a SEQUENCE instead:

CREATE SEQUENCE public.table_pkey_seq
    INCREMENT BY 1
    MINVALUE 1
    MAXVALUE 9223372036854775807
    START 1
    CACHE 100
    NO CYCLE;

CREATE TABLE public.table (
    pkey int8 NOT NULL DEFAULT nextval('public.table_pkey_seq'),
    one int4,
    two int4, 
    three int4,
    date date, 
    CONSTRAINT pkey PRIMARY KEY (pkey)
) SPLIT INTO 3 TABLETS;

explain (ANALYZE, DIST)
    INSERT INTO public.table (
        one, two, three, date
    ) VALUES (
        1938, 58, 678,'2024-06-21 12:32:43'
    );
explain (ANALYZE, DIST)
    INSERT INTO public.table (
        one, two, three, date
    ) VALUES (
        1938, 58, 678,'2024-06-21 12:32:43'
    );
                                           QUERY PLAN
------------------------------------------------------------------------------------------------
 Insert on "table"  (cost=0.00..0.01 rows=1 width=24) (actual time=0.536..0.536 rows=0 loops=1)
   ->  Result  (cost=0.00..0.01 rows=1 width=24) (actual time=0.008..0.008 rows=1 loops=1)
         Storage Table Write Requests: 1
         Storage Index Write Requests: 1
 Planning Time: 0.064 ms
 Execution Time: 14.645 ms
 Storage Read Requests: 0
 Storage Rows Scanned: 0
 Storage Write Requests: 2
 Catalog Read Requests: 0
 Catalog Write Requests: 0
 Storage Flush Requests: 1
 Storage Flush Execution Time: 13.863 ms
 Storage Execution Time: 13.863 ms
 Peak Memory Usage: 24 kB
(15 rows)

Using SERIAL:

CREATE TABLE public.table (
    pkey serial8 NOT NULL,
    one int4,
    two int4, 
    three int4,
    date date, 
    CONSTRAINT pkey PRIMARY KEY (pkey)
) SPLIT INTO 3 TABLETS;

explain (ANALYZE, DIST)
    INSERT INTO public.table (
        one, two, three, date
    ) VALUES (
        1938, 58, 678,'2024-06-21 12:32:43'
    );
explain (ANALYZE, DIST)
    INSERT INTO public.table (
        one, two, three, date
    ) VALUES (
        1938, 58, 678,'2024-06-21 12:32:43'
    );
                                            QUERY PLAN
--------------------------------------------------------------------------------------------------
 Insert on "table"  (cost=0.00..0.01 rows=1 width=24) (actual time=13.148..13.149 rows=0 loops=1)
   ->  Result  (cost=0.00..0.01 rows=1 width=24) (actual time=0.008..0.010 rows=1 loops=1)
         Storage Table Write Requests: 1
 Planning Time: 0.067 ms
 Execution Time: 13.269 ms
 Storage Read Requests: 0
 Storage Rows Scanned: 0
 Storage Write Requests: 1
 Catalog Read Requests: 0
 Catalog Write Requests: 0
 Storage Flush Requests: 0
 Storage Execution Time: 0.000 ms
 Peak Memory Usage: 24 kB
(13 rows