Open markpyb opened 1 month ago
This is the case where PG code that YB inherits from does a catalog table scan:
#19 0x000055a0f4deea01 in ybc_systable_getnext (default_scan=0x5737f051310) at ../../../../../../../src/postgres/src/backend/access/yb_access/yb_scan.c:3152
#20 0x000055a0f4d435d3 in systable_getnext (sysscan=0x5737f0517b0) at ../../../../../../../src/postgres/src/backend/access/index/genam.c:452
#21 0x000055a0f4e5316f in getOwnedSequences (relid=16386, attnum=1) at ../../../../../../src/postgres/src/backend/catalog/pg_depend.c:773
#22 0x000055a0f4e5328e in getOwnedSequence (relid=16386, attnum=1) at ../../../../../../src/postgres/src/backend/catalog/pg_depend.c:805
#23 0x000055a0f52592b6 in build_column_default (rel=0x5737e18dee0, attrno=1) at ../../../../../../src/postgres/src/backend/rewrite/rewriteHandler.c:1144
#24 0x000055a0f525cb3d in rewriteTargetListIU (targetList=0x5737f8ffc08, commandType=CMD_INSERT, override=OVERRIDING_NOT_SET, target_relation=0x5737e18dee0, result_rti=1, attrno_list=0x0) at ../../../../../../src/postgres/src/backend/rewrite/rewriteHandler.c:871
Notice build_column_default is required because of the identity defined on the table
pkey int8 GENERATED ALWAYS AS IDENTITY( INCREMENT BY 1 MINVALUE 1 MAXVALUE 9223372036854775807 START 1 CACHE 100 NO CYCLE) NOT NULL,
In this case PG does not use the catalog cache:
/*
* Collect a list of OIDs of all sequences owned by the specified relation,
* and column if specified.
*/
List *
getOwnedSequences(Oid relid, AttrNumber attnum)
{
List *result = NIL;
Relation depRel;
ScanKeyData key[3];
SysScanDesc scan;
HeapTuple tup;
depRel = heap_open(DependRelationId, AccessShareLock);
ScanKeyInit(&key[0],
Anum_pg_depend_refclassid,
BTEqualStrategyNumber, F_OIDEQ,
ObjectIdGetDatum(RelationRelationId));
ScanKeyInit(&key[1],
Anum_pg_depend_refobjid,
BTEqualStrategyNumber, F_OIDEQ,
ObjectIdGetDatum(relid));
if (attnum)
ScanKeyInit(&key[2],
Anum_pg_depend_refobjsubid,
BTEqualStrategyNumber, F_INT4EQ,
Int32GetDatum(attnum));
scan = systable_beginscan(depRel, DependReferenceIndexId, true,
Notice it calls systable_beginscan
which does a full catalog table scan in order to find out all sequences owned by the specified relation. It cannot find all sequences from the catalog cache.
Using a SEQUENCE
/SERIAL
instead of GENERATED ALWAYS AS IDENTITY
will prevent the extra cache lookups. This is because SEQUENCE
/SERIAL
default values are stored in pg_attrdef
, which is cached in the relcache, while IDENTITY
columns need to be looked up with getOwnedSequence
.
SEQUENCE
instead:CREATE SEQUENCE public.table_pkey_seq
INCREMENT BY 1
MINVALUE 1
MAXVALUE 9223372036854775807
START 1
CACHE 100
NO CYCLE;
CREATE TABLE public.table (
pkey int8 NOT NULL DEFAULT nextval('public.table_pkey_seq'),
one int4,
two int4,
three int4,
date date,
CONSTRAINT pkey PRIMARY KEY (pkey)
) SPLIT INTO 3 TABLETS;
explain (ANALYZE, DIST)
INSERT INTO public.table (
one, two, three, date
) VALUES (
1938, 58, 678,'2024-06-21 12:32:43'
);
explain (ANALYZE, DIST)
INSERT INTO public.table (
one, two, three, date
) VALUES (
1938, 58, 678,'2024-06-21 12:32:43'
);
QUERY PLAN
------------------------------------------------------------------------------------------------
Insert on "table" (cost=0.00..0.01 rows=1 width=24) (actual time=0.536..0.536 rows=0 loops=1)
-> Result (cost=0.00..0.01 rows=1 width=24) (actual time=0.008..0.008 rows=1 loops=1)
Storage Table Write Requests: 1
Storage Index Write Requests: 1
Planning Time: 0.064 ms
Execution Time: 14.645 ms
Storage Read Requests: 0
Storage Rows Scanned: 0
Storage Write Requests: 2
Catalog Read Requests: 0
Catalog Write Requests: 0
Storage Flush Requests: 1
Storage Flush Execution Time: 13.863 ms
Storage Execution Time: 13.863 ms
Peak Memory Usage: 24 kB
(15 rows)
SERIAL
:CREATE TABLE public.table (
pkey serial8 NOT NULL,
one int4,
two int4,
three int4,
date date,
CONSTRAINT pkey PRIMARY KEY (pkey)
) SPLIT INTO 3 TABLETS;
explain (ANALYZE, DIST)
INSERT INTO public.table (
one, two, three, date
) VALUES (
1938, 58, 678,'2024-06-21 12:32:43'
);
explain (ANALYZE, DIST)
INSERT INTO public.table (
one, two, three, date
) VALUES (
1938, 58, 678,'2024-06-21 12:32:43'
);
QUERY PLAN
--------------------------------------------------------------------------------------------------
Insert on "table" (cost=0.00..0.01 rows=1 width=24) (actual time=13.148..13.149 rows=0 loops=1)
-> Result (cost=0.00..0.01 rows=1 width=24) (actual time=0.008..0.010 rows=1 loops=1)
Storage Table Write Requests: 1
Planning Time: 0.067 ms
Execution Time: 13.269 ms
Storage Read Requests: 0
Storage Rows Scanned: 0
Storage Write Requests: 1
Catalog Read Requests: 0
Catalog Write Requests: 0
Storage Flush Requests: 0
Storage Execution Time: 0.000 ms
Peak Memory Usage: 24 kB
(13 rows
Jira Link: DB-11887
Description
(Workarounds at the bottom)
I am observing inserts with on an identity key to cause catalog read requests to lookup pg_depends using the below cluster info this cluster is not a fresh build, so its not clinical so there are cluster operations and mutations before this config, but this is the current config where the issue is present
Cluster Info:
gflags
DDL
DML
docdb tracing showed lookup to be
which is the table being inserted into and the identity key sequence of the table being inserted into
upon further investigation, easy workarounds are identified
1) this behaviour is only present in un-prepared statements, you can use prepare and execute SQL
or you can use driver to automatically prepare statements after X executions, once prepared there is no master lookup, here is a c# driver example (npgsql)
Max Auto Prepare=100;Auto Prepare Min Usages=5
2) this behaviour is not present when using a sequence as a default value, this issue can be worked around like by using this DDL instead if not using prepared statements
Issue Type
kind/perf
Warning: Please confirm that this issue does not contain any sensitive information