Open p-null opened 3 years ago
Add another query when accessing Hasura UI that times out here:
SELECT
COALESCE(
json_agg(
Row_to_json(functions)
),
'[]' :: JSON
) from (
SELECT * FROM (
SELECT p.proname::text AS function_name,
pn.nspname::text AS function_schema,
pd.description,
CASE
WHEN p.provariadic = 0::oid THEN false
ELSE true
END AS has_variadic,
CASE
WHEN p.provolatile::text = 'i'::character(1)::text THEN 'IMMUTABLE'::text
WHEN p.provolatile::text = 's'::character(1)::text THEN 'STABLE'::text
WHEN p.provolatile::text = 'v'::character(1)::text THEN 'VOLATILE'::text
ELSE NULL::text
END AS function_type,
pg_get_functiondef(p.oid) AS function_definition,
rtn.nspname::text AS return_type_schema,
rt.typname::text AS return_type_name,
rt.typtype::text AS return_type_type,
p.proretset AS returns_set,
( SELECT COALESCE(json_agg(json_build_object('schema', q.schema, 'name', q.name, 'type', q.type)), '[]'::json) AS "coalesce"
FROM ( SELECT pt.typname AS name,
pns.nspname AS schema,
pt.typtype AS type,
pat.ordinality
FROM unnest(COALESCE(p.proallargtypes, p.proargtypes::oid[])) WITH ORDINALITY pat(oid, ordinality)
LEFT JOIN pg_type pt ON pt.oid = pat.oid
LEFT JOIN pg_namespace pns ON pt.typnamespace = pns.oid
ORDER BY pat.ordinality) q) AS input_arg_types,
to_json(COALESCE(p.proargnames, ARRAY[]::text[])) AS input_arg_names,
p.pronargdefaults AS default_args,
p.oid::integer AS function_oid
FROM pg_proc p
JOIN pg_namespace pn ON pn.oid = p.pronamespace
JOIN pg_type rt ON rt.oid = p.prorettype
JOIN pg_namespace rtn ON rtn.oid = rt.typnamespace
LEFT JOIN pg_description pd ON p.oid = pd.objoid
WHERE
pn.nspname::text !~~ 'pg_%'::text
AND(pn.nspname::text <> ALL (ARRAY ['information_schema'::text]))
AND NOT(EXISTS (
SELECT
1 FROM pg_aggregate
WHERE
pg_aggregate.aggfnoid::oid = p.oid))) as info
WHERE function_schema=''
AND has_variadic = FALSE
AND returns_set = TRUE
AND return_type_type = 'c'
ORDER BY function_name ASC
) as functions;
I have isolated the slowness that it was not caused by network issue or Hasura app itself.
We are also experimenting with Yugabyte + Hasura and facing challenges related to poor performance. Originally I thought this was a Hasura issue. However, execution of the following very simple query directly in DB takes ~30 seconds.
SELECT column_name, table_name, is_generated, is_identity, identity_generation
FROM information_schema.columns WHERE table_schema = 'public';
cc: @rkarthik007
cc @m-iancu
It looks like the query plan is suboptimal. First, information_schema.columns is a view:
SELECT current_database()::information_schema.sql_identifier AS table_catalog,
nc.nspname::information_schema.sql_identifier AS table_schema,
c.relname::information_schema.sql_identifier AS table_name,
a.attname::information_schema.sql_identifier AS column_name,
a.attnum::information_schema.cardinal_number AS ordinal_position,
pg_get_expr(ad.adbin, ad.adrelid)::information_schema.character_data AS column_default,
CASE
WHEN a.attnotnull OR t.typtype = 'd'::"char" AND t.typnotnull THEN 'NO'::text
ELSE 'YES'::text
END::information_schema.yes_or_no AS is_nullable,
CASE
WHEN t.typtype = 'd'::"char" THEN
CASE
WHEN bt.typelem <> 0::oid AND bt.typlen = '-1'::integer THEN 'ARRAY'::text
WHEN nbt.nspname = 'pg_catalog'::name THEN format_type(t.typbasetype, NULL::integer)
ELSE 'USER-DEFINED'::text
END
ELSE
CASE
WHEN t.typelem <> 0::oid AND t.typlen = '-1'::integer THEN 'ARRAY'::text
WHEN nt.nspname = 'pg_catalog'::name THEN format_type(a.atttypid, NULL::integer)
ELSE 'USER-DEFINED'::text
END
END::information_schema.character_data AS data_type,
information_schema._pg_char_max_length(information_schema._pg_truetypid(a.*, t.*), information_schema._pg_truetypmod(a.*, t.*))::information_schema.cardinal_number AS character_maximum_length,
information_schema._pg_char_octet_length(information_schema._pg_truetypid(a.*, t.*), information_schema._pg_truetypmod(a.*, t.*))::information_schema.cardinal_number AS character_octet_length,
information_schema._pg_numeric_precision(information_schema._pg_truetypid(a.*, t.*), information_schema._pg_truetypmod(a.*, t.*))::information_schema.cardinal_number AS numeric_precision,
information_schema._pg_numeric_precision_radix(information_schema._pg_truetypid(a.*, t.*), information_schema._pg_truetypmod(a.*, t.*))::information_schema.cardinal_number AS numeric_precision_radix,
information_schema._pg_numeric_scale(information_schema._pg_truetypid(a.*, t.*), information_schema._pg_truetypmod(a.*, t.*))::information_schema.cardinal_number AS numeric_scale,
information_schema._pg_datetime_precision(information_schema._pg_truetypid(a.*, t.*), information_schema._pg_truetypmod(a.*, t.*))::information_schema.cardinal_number AS datetime_precision,
information_schema._pg_interval_type(information_schema._pg_truetypid(a.*, t.*), information_schema._pg_truetypmod(a.*, t.*))::information_schema.character_data AS interval_type,
NULL::integer::information_schema.cardinal_number AS interval_precision,
NULL::character varying::information_schema.sql_identifier AS character_set_catalog,
NULL::character varying::information_schema.sql_identifier AS character_set_schema,
NULL::character varying::information_schema.sql_identifier AS character_set_name,
CASE
WHEN nco.nspname IS NOT NULL THEN current_database()
ELSE NULL::name
END::information_schema.sql_identifier AS collation_catalog,
nco.nspname::information_schema.sql_identifier AS collation_schema,
co.collname::information_schema.sql_identifier AS collation_name,
CASE
WHEN t.typtype = 'd'::"char" THEN current_database()
ELSE NULL::name
END::information_schema.sql_identifier AS domain_catalog,
CASE
WHEN t.typtype = 'd'::"char" THEN nt.nspname
ELSE NULL::name
END::information_schema.sql_identifier AS domain_schema,
CASE
WHEN t.typtype = 'd'::"char" THEN t.typname
ELSE NULL::name
END::information_schema.sql_identifier AS domain_name,
current_database()::information_schema.sql_identifier AS udt_catalog,
COALESCE(nbt.nspname, nt.nspname)::information_schema.sql_identifier AS udt_schema,
COALESCE(bt.typname, t.typname)::information_schema.sql_identifier AS udt_name,
NULL::character varying::information_schema.sql_identifier AS scope_catalog,
NULL::character varying::information_schema.sql_identifier AS scope_schema,
NULL::character varying::information_schema.sql_identifier AS scope_name,
NULL::integer::information_schema.cardinal_number AS maximum_cardinality,
a.attnum::information_schema.sql_identifier AS dtd_identifier,
'NO'::character varying::information_schema.yes_or_no AS is_self_referencing,
CASE
WHEN a.attidentity = ANY (ARRAY['a'::"char", 'd'::"char"]) THEN 'YES'::text
ELSE 'NO'::text
END::information_schema.yes_or_no AS is_identity,
CASE a.attidentity
WHEN 'a'::"char" THEN 'ALWAYS'::text
WHEN 'd'::"char" THEN 'BY DEFAULT'::text
ELSE NULL::text
END::information_schema.character_data AS identity_generation,
seq.seqstart::information_schema.character_data AS identity_start,
seq.seqincrement::information_schema.character_data AS identity_increment,
seq.seqmax::information_schema.character_data AS identity_maximum,
seq.seqmin::information_schema.character_data AS identity_minimum,
CASE
WHEN seq.seqcycle THEN 'YES'::text
ELSE 'NO'::text
END::information_schema.yes_or_no AS identity_cycle,
'NEVER'::character varying::information_schema.character_data AS is_generated,
NULL::character varying::information_schema.character_data AS generation_expression,
CASE
WHEN (c.relkind = ANY (ARRAY['r'::"char", 'p'::"char"])) OR (c.relkind = ANY (ARRAY['v'::"char", 'f'::"char"])) AND pg_column_is_updatable(c.oid::regclass, a.attnum, false) THEN 'YES'::text
ELSE 'NO'::text
END::information_schema.yes_or_no AS is_updatable
FROM pg_attribute a
LEFT JOIN pg_attrdef ad ON a.attrelid = ad.adrelid AND a.attnum = ad.adnum
JOIN (pg_class c
JOIN pg_namespace nc ON c.relnamespace = nc.oid) ON a.attrelid = c.oid
JOIN (pg_type t
JOIN pg_namespace nt ON t.typnamespace = nt.oid) ON a.atttypid = t.oid
LEFT JOIN (pg_type bt
JOIN pg_namespace nbt ON bt.typnamespace = nbt.oid) ON t.typtype = 'd'::"char" AND t.typbasetype = bt.oid
LEFT JOIN (pg_collation co
JOIN pg_namespace nco ON co.collnamespace = nco.oid) ON a.attcollation = co.oid AND (nco.nspname <> 'pg_catalog'::name OR co.collname <> 'default'::name)
LEFT JOIN (pg_depend dep
JOIN pg_sequence seq ON dep.classid = 'pg_class'::regclass::oid AND dep.objid = seq.seqrelid AND dep.deptype = 'i'::"char") ON dep.refclassid = 'pg_class'::regclass::oid AND dep.refobjid = c.oid AND dep.refobjsubid = a.attnum
WHERE NOT pg_is_other_temp_schema(nc.oid) AND a.attnum > 0 AND NOT a.attisdropped AND (c.relkind = ANY (ARRAY['r'::"char", 'v'::"char", 'f'::"char", 'p'::"char"])) AND (pg_has_role(c.relowner, 'USAGE'::text) OR has_column_privilege(c.oid, a.attnum, 'SELECT, INSERT, UPDATE, REFERENCES'::text));
So even if the query syntax is simple, internally the query is fairly complex. The default plan I get locally is:
Nested Loop Left Join (cost=69.36..120.09 rows=56 width=160) (actual time=1779.718..1779.718 rows=0 loops=1)
-> Hash Right Join (cost=69.36..102.37 rows=56 width=133) (actual time=1779.718..1779.718 rows=0 loops=1)
Hash Cond: ((dep.refobjid = c.oid) AND (dep.refobjsubid = a.attnum))
-> Nested Loop (cost=0.00..32.25 rows=100 width=8) (never executed)
-> Index Scan using pg_depend_reference_index on pg_depend dep (cost=0.00..17.25 rows=100 width=12) (never executed)
Index Cond: (refclassid = '1259'::oid)
Filter: ((classid = '1259'::oid) AND (deptype = 'i'::"char"))
-> Index Scan using pg_sequence_seqrelid_index on pg_sequence seq (cost=0.00..0.15 rows=1 width=4) (never executed)
Index Cond: (seqrelid = dep.objid)
-> Hash (cost=68.52..68.52 rows=56 width=139) (actual time=1779.713..1779.713 rows=0 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 8kB
-> Nested Loop (cost=0.00..68.52 rows=56 width=139) (actual time=1779.713..1779.713 rows=0 loops=1)
-> Nested Loop (cost=0.00..62.14 rows=56 width=143) (actual time=1779.712..1779.712 rows=0 loops=1)
-> Nested Loop Left Join (cost=0.00..54.63 rows=56 width=147) (actual time=32.576..1402.041 rows=1767 loops=1)
Join Filter: (t.typtype = 'd'::"char")
-> Nested Loop (cost=0.00..41.15 rows=56 width=152) (actual time=32.334..891.808 rows=1767 loops=1)
-> Nested Loop (cost=0.00..32.75 rows=56 width=147) (actual time=31.998..489.840 rows=1767 loops=1)
-> Index Scan using pg_attribute_relid_attnum_index on pg_attribute a (cost=0.00..15.25 rows=100 width=79) (actual time=29.254..34.870 rows=1962 loops=1)
Index Cond: (attnum > 0)
Filter: (NOT attisdropped)
-> Index Scan using pg_class_oid_index on pg_class c (cost=0.00..0.17 rows=1 width=76) (actual time=0.214..0.214 rows=1 loops=1962)
Index Cond: (oid = a.attrelid)
Filter: ((relkind = ANY ('{r,v,f,p}'::"char"[])) AND (pg_has_role(relowner, 'USAGE'::text) OR has_column_privilege(oid, a.attnum, 'SELECT, INSERT, UPDATE, REFERENCES'::text)))
Rows Removed by Filter: 0
-> Index Scan using pg_type_oid_index on pg_type t (cost=0.00..0.15 rows=1 width=13) (actual time=0.211..0.211 rows=1 loops=1767)
Index Cond: (oid = a.atttypid)
-> Nested Loop (cost=0.00..0.23 rows=1 width=4) (actual time=0.287..0.288 rows=0 loops=1767)
-> Index Scan using pg_type_oid_index on pg_type bt (cost=0.00..0.11 rows=1 width=8) (actual time=0.196..0.196 rows=0 loops=1767)
Index Cond: (t.typbasetype = oid)
-> Index Scan using pg_namespace_oid_index on pg_namespace nbt (cost=0.00..0.11 rows=1 width=4) (actual time=0.194..0.194 rows=1 loops=661)
Index Cond: (oid = bt.typnamespace)
-> Index Scan using pg_namespace_oid_index on pg_namespace nc (cost=0.00..0.13 rows=1 width=4) (actual time=0.204..0.204 rows=0 loops=1767)
Index Cond: (oid = c.relnamespace)
Filter: ((NOT pg_is_other_temp_schema(oid)) AND (((nspname)::information_schema.sql_identifier)::text = 'public'::text))
Rows Removed by Filter: 1
-> Index Scan using pg_namespace_oid_index on pg_namespace nt (cost=0.00..0.11 rows=1 width=4) (never executed)
Index Cond: (oid = t.typnamespace)
-> Nested Loop (cost=0.00..0.28 rows=1 width=4) (never executed)
-> Index Scan using pg_collation_oid_index on pg_collation co (cost=0.00..0.15 rows=1 width=72) (never executed)
Index Cond: (a.attcollation = oid)
-> Index Scan using pg_namespace_oid_index on pg_namespace nco (cost=0.00..0.12 rows=1 width=68) (never executed)
Index Cond: (oid = co.collnamespace)
Filter: ((nspname <> 'pg_catalog'::name) OR (co.collname <> 'default'::name))
Planning Time: 0.949 ms
Execution Time: 1805.072 ms
(45 rows)
I didn't try to tune this (e.g. with pg_hint_plan), but just noticed there is one bad nested loop join, so I just disabled nestloop joins (with set enable_nestloop = false
) entirely to check the updated explain analyze output:
Hash Left Join (cost=948.04..1064.47 rows=56 width=160) (actual time=67.859..67.859 rows=0 loops=1)
Hash Cond: ((c.oid = dep.refobjid) AND (a.attnum = dep.refobjsubid))
-> Hash Right Join (cost=816.53..929.73 rows=56 width=135) (actual time=67.858..67.858 rows=0 loops=1)
Hash Cond: (co.oid = a.attcollation)
-> Hash Join (cost=112.50..215.14 rows=1000 width=4) (never executed)
Hash Cond: (co.collnamespace = nco.oid)
Join Filter: ((nco.nspname <> 'pg_catalog'::name) OR (co.collname <> 'default'::name))
-> Seq Scan on pg_collation co (cost=0.00..100.00 rows=1000 width=72) (never executed)
-> Hash (cost=100.00..100.00 rows=1000 width=68) (never executed)
-> Seq Scan on pg_namespace nco (cost=0.00..100.00 rows=1000 width=68) (never executed)
-> Hash (cost=703.33..703.33 rows=56 width=139) (actual time=67.854..67.855 rows=0 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 8kB
-> Hash Join (cost=582.77..703.33 rows=56 width=139) (actual time=67.854..67.854 rows=0 loops=1)
Hash Cond: (nc.oid = c.relnamespace)
-> Seq Scan on pg_namespace nc (cost=0.00..110.00 rows=1000 width=4) (actual time=0.997..1.011 rows=1 loops=1)
Filter: ((NOT pg_is_other_temp_schema(oid)) AND (((nspname)::information_schema.sql_identifier)::text = 'public'::text))
Rows Removed by Filter: 3
-> Hash (cost=582.07..582.07 rows=56 width=143) (actual time=66.838..66.838 rows=1767 loops=1)
Buckets: 2048 (originally 1024) Batches: 1 (originally 1) Memory Usage: 320kB
-> Hash Right Join (cost=468.73..582.07 rows=56 width=143) (actual time=65.962..66.487 rows=1767 loops=1)
Hash Cond: (bt.oid = t.typbasetype)
Join Filter: (t.typtype = 'd'::"char")
-> Hash Join (cost=112.50..215.14 rows=1000 width=4) (actual time=2.549..2.762 rows=366 loops=1)
Hash Cond: (bt.typnamespace = nbt.oid)
-> Seq Scan on pg_type bt (cost=0.00..100.00 rows=1000 width=8) (actual time=2.188..2.337 rows=366 loops=1)
-> Hash (cost=100.00..100.00 rows=1000 width=4) (actual time=0.353..0.353 rows=4 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Seq Scan on pg_namespace nbt (cost=0.00..100.00 rows=1000 width=4) (actual time=0.348..0.350 rows=4 loops=1)
-> Hash (cost=355.53..355.53 rows=56 width=148) (actual time=63.402..63.402 rows=1767 loops=1)
Buckets: 2048 (originally 1024) Batches: 1 (originally 1) Memory Usage: 334kB
-> Hash Join (cost=244.97..355.53 rows=56 width=148) (actual time=62.799..63.061 rows=1767 loops=1)
Hash Cond: (nt.oid = t.typnamespace)
-> Seq Scan on pg_namespace nt (cost=0.00..100.00 rows=1000 width=4) (actual time=0.258..0.260 rows=4 loops=1)
-> Hash (cost=244.27..244.27 rows=56 width=152) (actual time=62.536..62.536 rows=1767 loops=1)
Buckets: 2048 (originally 1024) Batches: 1 (originally 1) Memory Usage: 341kB
-> Hash Join (cost=133.71..244.27 rows=56 width=152) (actual time=61.691..62.170 rows=1767 loops=1)
Hash Cond: (t.oid = a.atttypid)
-> Seq Scan on pg_type t (cost=0.00..100.00 rows=1000 width=13) (actual time=4.357..4.528 rows=366 loops=1)
-> Hash (cost=133.01..133.01 rows=56 width=147) (actual time=57.325..57.325 rows=1767 loops=1)
Buckets: 2048 (originally 1024) Batches: 1 (originally 1) Memory Usage: 327kB
-> Hash Join (cost=117.50..133.01 rows=56 width=147) (actual time=28.067..56.909 rows=1767 loops=1)
Hash Cond: (a.attrelid = c.oid)
Join Filter: (pg_has_role(c.relowner, 'USAGE'::text) OR has_column_privilege(c.oid, a.attnum, 'SELECT, INSERT, UPDATE, REFERENCES'::text))
-> Index Scan using pg_attribute_relid_attnum_index on pg_attribute a (cost=0.00..15.25 rows=100 width=79) (actual time=24.230..52.472 rows=1962 loops=1)
Index Cond: (attnum > 0)
Filter: (NOT attisdropped)
-> Hash (cost=105.00..105.00 rows=1000 width=76) (actual time=3.806..3.806 rows=191 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 29kB
-> Seq Scan on pg_class c (cost=0.00..105.00 rows=1000 width=76) (actual time=3.613..3.772 rows=191 loops=1)
Filter: (relkind = ANY ('{r,v,f,p}'::"char"[]))
Rows Removed by Filter: 118
-> Hash (cost=130.01..130.01 rows=100 width=8) (never executed)
-> Hash Join (cost=112.50..130.01 rows=100 width=8) (never executed)
Hash Cond: (dep.objid = seq.seqrelid)
-> Index Scan using pg_depend_reference_index on pg_depend dep (cost=0.00..17.25 rows=100 width=12) (never executed)
Index Cond: (refclassid = '1259'::oid)
Filter: ((classid = '1259'::oid) AND (deptype = 'i'::"char"))
-> Hash (cost=100.00..100.00 rows=1000 width=4) (never executed)
-> Seq Scan on pg_sequence seq (cost=0.00..100.00 rows=1000 width=4) (never executed)
Planning Time: 0.814 ms
Execution Time: 68.752 ms
So, locally, execution time goes down from 1805.072 ms
--> 68.752 ms
.
However, we might be able to do even better, the latter is just with the default plan when nested loop joins are disabled (rather than a finely tuned plan).
We should investigate further what is needed to pick the correct plan.
I suggest "patching" the catalog view definition to materialize as a CTE the tables where we don't want an index access. For example,
explain analyze SELECT column_name, table_name, is_generated, is_identity, identity_generation FROM information_schema.columns where table_schema = 'public';
takes 15 milliseconds when information_schema.columns is defined as, instead of 3 seconds
with pg_class as (select oid,* from pg_class)
, pg_namespace as(select oid,* from pg_namespace)
SELECT pg_class.oid,
...
I've "patched the view with this:
-- BE CAREFUL THIS REWRITES THE CATALOG VIEW AND IS JUST EXPERIMENTAL
do $$
declare i record;
begin for i in (
select format('
create or replace view %I.%I as
with pg_class as (select oid,* from pg_class)
, pg_namespace as(select oid,* from pg_namespace)
%s'
,schemaname,viewname,definition
) ddl from pg_views
where schemaname='information_schema' and viewname='columns'
and definition not like '%WITH pg_class AS%'
) loop execute i.ddl; end loop;
end; $$
;
@m-iancu What do you think?
hmm, thats bad. Yugabyte is not usable for us then. Why is this only medium priority? I'd suggest to bump that up, or CockroachDB will frontrun you with their incoming hasura integration..
@m-iancu are there any plans for fixing this? This is blocking our move to Yugabyte.
@munjalpatel, @maaft which version are you using?
Those queries should be faster with the new optimization for Nested Loop joins that can be enabled with:
set yb_bnl_batch_size=100;
In previous versions, the workaround is to disable Nested Loops for those queries (set enable_nestloop=off;
). This is faster but may have side effects on other queries. The batched nested loop should be ok for all queries, but it is in preview only right now.
@FranckPachot v2.15.3
-- created a test cluster on Yugabyte Cloud today.
@munjalpatel
can you explain your use case in detail?
@ddorian the usecase is to use Yugabyte with Hasura. On Hasura console, almost every query to Yugabyte takes ~10 seconds.
@p-null we'd like to analyze the query plan Yugabyte uses here. Could you perform a db dump (could be some minimal db that reproduces the issue) and share it? Here are docs for running a db dump: https://docs.yugabyte.com/preview/admin/ysql-dump/. Let me know if you need any help.
We experience similar issues with gorm automigrate. Queries to information_schema
take tens of seconds on empty one node database
@morigs
can you upload the schema + the queries that you're making?
@ddorian I can send schema to email if this matters, but IMO it doesn't. We have pretty simple schema, just 66 small tables and a couple of indexes. I attach two queries below.
The first one is the one generated by gorm. In our code it took 15s and in IDE it took around 10s.
The seconds query is a plain select all from columns table.
Execution plans are in spoilers (durations smaller than I mentioned because I executed these queries multiple times before running explain analyse)
I know nothing about internals, however, execution plan for simple select *
looks far too complex to me 🥲
Shouldn't this be a materialized view or something?
Second Query:
select * from information_schema.columns limit 500;
Sorry for disturbing, but I'm interested on what are plans for this? From what I see, this a common issue, it blocks people from migrating to Yugabyte, but 2 years have passed and it sill not fixed.
Is there any ETA? Can community somehow help with this?
Hey, thanks for checking in. We did a thorough investigation on system catalog queries recently, and the upshot is there are some issues with CBO currently that may be preventing us from optimizing these queries as much as we could be (see this, this, and this). For example just enabling CBO for select * from information_schema.columns limit 500;
makes the query time jump to 300s. Eventually CBO and BNL will go a long way in improving query performance.
For now, setting BNL batch size to 1000 (without CBO) does reduce the query time by ~40% for me. Not sure if you tried that already but it could be useful for you.
@morigs thanks for reporting. There are general optimizations ongoing for complex queries and many queries on the catalog benefit from Batched Nested Loop (set yb_bnl_batch_size=1000
) but here there is still one Nested Loop is not batched /cc @tanujnay112
-> YB Batched Nested Loop Join (actual time=551.978..552.602 rows=1000 loops=1)
Join Filter: (a.atttypid = t.oid)
👉 -> Nested Loop (actual time=11.052..546.034 rows=1000 loops=1)
-> Index Scan using pg_attribute_relid_attnum_index on pg_attribute a (actualtime=7.069..25.451 rows=1198 loops=1)
Index Cond: (attnum > 0)
Remote Filter: (NOT attisdropped)
Storage Index Read Requests: 2
Storage Index Execution Time: 24.000 ms
-> Index Scan using pg_class_oid_index on pg_class c (actual time=0.419..0.419rows=1 loops=1198)
Index Cond: (oid = a.attrelid)
Filter: ((relkind = ANY ('{r,v,f,p}'::"char"[])) AND (pg_has_role(relowner, 'USAGE'::text) OR has_column_privilege(oid, a.attnum, 'SELECT, INSERT, UPDATE, REFERENCES'::text)))
Rows Removed by Filter: 0
Storage Index Read Requests: 1
Storage Index Execution Time: 0.394 ms
-> Index Scan using pg_type_oid_index on pg_type t (actual time=1.587..1.654 rows=32 loops=1)
Index Cond: (oid = ANY (ARRAY[a.atttypid, $3, $4, ..., $1001]))
Storage Index Read Requests: 1
Storage Index Execution Time: 0.000 ms
Thanks for responding. Happy to hear that there is ongoing work on this and this is not lost :)
If I understand you correctly, setting set yb_bnl_batch_size=1000
should be a workaround. I'm not sure about previously mentioned use cases (e.g. Hasura), but we use gorm.AutoMigrate
. It takes connections from the Go's sql.DB
pool and then discovers the current schema. I don't know a way to execute SET
on every connection in go. I've checked the docs, it doesn't mention this.
So, is it possible to set this in other way (perhaps using configuration file)? Is it safe to set this globally?
@morigs
I'll ask internally if it can be set as a gflag. It can be set as a runtime flag, which was just committed https://github.com/yugabyte/yugabyte-db/commit/a5ee779b2269bbacdcbb53552d9b1855b732cfe7
It takes connections from the Go's sql.DB pool and then discovers the current schema. I don't know a way to execute SET on every connection in go. I've checked the docs, it doesn't mention this.
You should be able to "hook" into the "getting connection" cycle and execute a query before anything happens. Either in the ORM or in the connection-pool or somewhere in-between.
yb_bnl_batch_size
is a YSQL (the postgres backend) parameter. Can be set at many level: cluster (--ysql_pg_conf_csv=...
), database (alter database ... set
) , user (alter user ... set
), session (set ...
), transaction (set local ...
), query (/*+ Set ( random_page_cost 1e42 ) */
) but the problem is that there's still one non-batched nested loop
Another query that cannot benefit from Batched Nested Loop (run by Npgsql driver when connecting):
yugabyte=# explain (costs off)
SELECT * FROM ( SELECT typ.oid, typnamespace, typname, typrelid, typnotnull, relkind, typelem AS elemoid,
CASE WHEN proc.proname='array_recv' THEN typ.typelem
WHEN typ.typtype='r' THEN rngsubtype
WHEN typ.typtype='d' THEN typ.typbasetype
END AS elemtypoid
FROM pg_type AS typ
LEFT JOIN pg_class AS cls ON (cls.oid = typ.typrelid)
LEFT JOIN pg_proc AS proc ON proc.oid = typ.typreceive
LEFT JOIN pg_range ON (pg_range.rngtypid = typ.oid)
) AS typ
LEFT JOIN pg_type AS elemtyp ON elemtyp.oid = elemtypoid
;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------------------
YB Batched Nested Loop Left Join
Join Filter: (elemtyp.oid = CASE WHEN (proc.proname = 'array_recv'::name) THEN typ.typelem WHEN (typ.typtype = 'r'::"char") THEN pg_range.r
ngsubtype WHEN (typ.typtype = 'd'::"char") THEN typ.typbasetype ELSE NULL::oid END)
-> Nested Loop Left Join
-> Nested Loop Left Join
-> YB Batched Nested Loop Left Join
Join Filter: (cls.oid = typ.typrelid)
-> Seq Scan on pg_type typ
-> Index Scan using pg_class_oid_index on pg_class cls
Index Cond: (oid = ANY (ARRAY[typ.typrelid, $1, $2, ..., $1023]))
-> Index Scan using pg_proc_oid_index on pg_proc proc
Index Cond: (oid = (typ.typreceive)::oid)
-> Index Scan using pg_range_rngtypid_index on pg_range
Index Cond: (rngtypid = typ.oid)
-> Index Scan using pg_type_oid_index on pg_type elemtyp
Index Cond: (oid = ANY (ARRAY[CASE WHEN (proc.proname = 'array_recv'::name) THEN typ.typelem WHEN (typ.typtype = 'r'::"char") THEN pg
_range.rngsubtype WHEN (typ.typtype = 'd'::"char") THEN typ.typbasetype ELSE NULL::oid END, CASE WHEN ($1027 = 'array_recv'::name) THEN $2051
WHEN ($3075 = 'r'::"char") THEN $4099 WHEN ($3075 = 'd'::"char") THEN $5123 ELSE NULL::oid END, CASE WHEN ($1028 = 'array_recv'::name) THEN $2
052 WHEN ($3076 = 'r'::"char") THEN $4100 WHEN ($3076 = 'd'::"char") THEN $5124 ELSE NULL::oid END, ..., CASE WHEN ($2049 = 'array_recv'::name
) THEN $3073 WHEN ($4097 = 'r'::"char") THEN $5121 WHEN ($4097 = 'd'::"char") THEN $6145 ELSE NULL::oid END]))
(15 rows)
2024 still the same issue
I do not understand why they are confusing everyone with videos that Hasura is compatible with Yugabyte NO, it is not, I'm running 3 masters 3 tsservers with a lot of resources dedicated for everyone And a simple query to get the schema takes forever and fails due to timeout While using a single postgres db just does that instantly with no effort
cc @ddorian @FranckPachot
Another slow query (8s with cost based optimizer in 2.21.1.0,) run by Directus:
SELECT c.table_name
, c.column_name
, c.column_default as default_value
, c.data_type
, c.character_maximum_length as max_length
, c.is_generated = 'ALWAYS' is_generated
, CASE WHEN c.is_identity = 'YES' THEN true ELSE false END is_identity
, CASE WHEN c.is_nullable = 'YES' THEN true ELSE false END is_nullable
FROM
information_schema.columns c
LEFT JOIN information_schema.tables t
ON c.table_name = t.table_name
WHERE
t.table_type = 'BASE TABLE'
AND c.table_schema IN ($1);
as reported by https://yugabyte-db.slack.com/archives/CG0KQF0GG/p1723969425773849?thread_ts=1723905877.013559&cid=CG0KQF0GG, queries makes it incompatible with Directus.
Many Nested Loops are not batched. Easy to reproduce with:
select format('
create table %I (id bigserial primary key, a int unique, b int unique, c int unique)
', 'tmp'||generate_series(1,100)
)\gexec
prepare q as
SELECT c.table_name
, c.column_name
, c.column_default as default_value
, c.data_type
, c.character_maximum_length as max_length
, c.is_generated = 'ALWAYS' is_generated
, CASE WHEN c.is_identity = 'YES' THEN true ELSE false END is_identity
, CASE WHEN c.is_nullable = 'YES' THEN true ELSE false END is_nullable
FROM
information_schema.columns c
LEFT JOIN information_schema.tables t
ON c.table_name = t.table_name
WHERE
t.table_type = 'BASE TABLE'
AND c.table_schema IN ($1);
analyze;
set yb_enable_optimizer_statistics to on;
set yb_enable_base_scans_cost_model to on;
explain (analyze, costs off) execute q('public');
Plan:
-> Nested Loop Left Join (actual time=46.757..10122.527 rows=400 loops=1)
Join Filter: (((c.relname)::information_schema.sql_identifier)::text = ((c_1.relname)::information_schema.sql_identifier)::text)
Rows Removed by Join Filter: 119600
Filter: ((((CASE WHEN (nc_1.oid = pg_my_temp_schema()) THEN 'LOCAL TEMPORARY'::text WHEN (c_1.relkind = ANY ('{r,p}'::"char"[])) THEN 'BASE TABLE'::text WHEN (c_1.relkind = 'v'::"char") THEN 'VIEW'::text WHEN (c_1.relkind = 'f'::"char") THEN 'FOREIGN'::text ELSE NULL::text END)::information_schema.character_data))::text = 'BASE TABLE'::text)
-> Nested Loop (actual time=24.623..5492.409 rows=400 loops=1)
-> Nested Loop Left Join (actual time=22.995..4392.108 rows=400 loops=1)
-> Nested Loop Left Join (actual time=21.426..3227.006 rows=400 loops=1)
Join Filter: (t.typtype = 'd'::"char")
-> Nested Loop (actual time=19.049..1833.778 rows=400 loops=1)
-> Nested Loop (actual time=16.798..461.001 rows=400 loops=1)
-> YB Batched Nested Loop Join (actual time=13.734..14.038 rows=100 loops=1)
Join Filter: (c.relnamespace = nc.oid)
...
-> Index Scan using pg_type_oid_index on pg_type t (actual time=3.394..3.394 rows=1 loops=400)
Index Cond: (oid = a.atttypid)
-> Nested Loop (actual time=3.477..3.477 rows=0 loops=400)
-> Index Scan using pg_type_oid_index on pg_type bt (actual time=3.450..3.450 rows=0 loops=400)
Index Cond: (t.typbasetype = oid)
-> Index Scan using pg_namespace_oid_index on pg_namespace nbt (never executed)
Index Cond: (oid = bt.typnamespace)
-> Index Scan using pg_attrdef_adrelid_adnum_index on pg_attrdef ad (actual time=2.880..2.880 rows=0 loops=400)
Index Cond: ((a.attrelid = adrelid) AND (a.attnum = adnum))
-> Index Scan using pg_namespace_oid_index on pg_namespace nt (actual time=2.725..2.725 rows=1 loops=400)
Index Cond: (oid = t.typnamespace)
-> Hash Join (actual time=11.164..11.507 rows=300 loops=400)
Jira Link: DB-2522
Commonly PG-compatible tools and ORMs (e.g. Hasura examples in comments below) query the database system catalog to build their internal metadata and/or build their UI pages. For these tools to work seamlessly in YSQL it is important the performance of these queries are comparable with vanilla Postgres.
Why are some system queries slow
Often, these queries end up being quite complex either directly (large complex generated by the tool/ORM) or indirectly (e.g. simple-looking queries to
information_schema
views many of which are very complex).For example the definition for the commonly-used
information_schema.columns
view is ~120 lines of SQL (See information_schema.sql#L653:L773).Therefore, the current default
EXPLAIN ANALYZE
plan for the "simple" queryhas around ~60 lines and involves 9 joins (see information_schema.column EXPLAIN output). Moreover, such views sometimes come up as one of multiple join tables/views inside an already large query generated by an ORM/tool.
Additionally, in YSQL, the catalog data may not be local, so internal data lookups required for query execution will often require network roundtrips rather that just local disk access. In particular in multi-region clusters the latency cost of these roundrips can be particularly large (e.g. in the tens of milliseconds range per roundtrip). Therefore, the latency impact of picking a suboptimal plan in YSQL for these complex queries can be significantly larger than in vanilla postgres.
Current mitigation and workarounds
Longer-term plan is improve and tune our cost-based optimizer to ensure these queries are executed with optimal plans (see section below). Until then, since these queries are internally generated by ORMs rewriting them to be more efficient or add hints is typically not possible.
pg_catalog
tables rather then the genericinformation_schema
views (as the former option is typically much faster).pg_hint_table
to add hints without changing the query syntax:set enable_nestloop=false
can help).Current Roadmap
yb_enable_base_scans_cost_model
totrue
. May require running ANALYZE manually to get table statistics (depending on the DB version)Original user report
Hi, open this pr to report and track the slowness/timeout issue when testing hasura + yb.
I am running yb(2.5.3.1) on a 4 node k8s cluster (20 cores and 60 gb) with the following (related) values:
The connected database only has 1 table with 3 records, having columns of id(int), name(text).
When accessing hasura console (the UI), some queries will be executed to display the table/schema info of the database. However I consistently get failing/timeout errors at the frontend.
By analyzing networking requests sent from the frontend, i was able to locate the queries that caused the timeout. The following is one of the queries that hasura console will execute. Here is one of them:
First exec to pod:
Then
The lantency of running normal graphql queries on application tables is fine.
The timeout/slowness occurs eveytime accessing the Data tab on the hasura UI, which pretty much makes the Data tab unusable.
Link to a comment here for more ideas: https://github.com/hasura/graphql-engine/issues/3549#issuecomment-566514142