yugabyte / yugabyte-db

YugabyteDB - the cloud native distributed SQL database for mission-critical applications.
https://www.yugabyte.com
Other
8.99k stars 1.07k forks source link

[YSQL] Slow queries to some system views #7745

Open p-null opened 3 years ago

p-null commented 3 years ago

Jira Link: DB-2522

Commonly PG-compatible tools and ORMs (e.g. Hasura examples in comments below) query the database system catalog to build their internal metadata and/or build their UI pages. For these tools to work seamlessly in YSQL it is important the performance of these queries are comparable with vanilla Postgres.

Why are some system queries slow

Often, these queries end up being quite complex either directly (large complex generated by the tool/ORM) or indirectly (e.g. simple-looking queries to information_schema views many of which are very complex).

For example the definition for the commonly-used information_schema.columns view is ~120 lines of SQL (See information_schema.sql#L653:L773).

Therefore, the current default EXPLAIN ANALYZE plan for the "simple" query

SELECT * FROM information_schema.columns;

has around ~60 lines and involves 9 joins (see information_schema.column EXPLAIN output). Moreover, such views sometimes come up as one of multiple join tables/views inside an already large query generated by an ORM/tool.

Additionally, in YSQL, the catalog data may not be local, so internal data lookups required for query execution will often require network roundtrips rather that just local disk access. In particular in multi-region clusters the latency cost of these roundrips can be particularly large (e.g. in the tens of milliseconds range per roundtrip). Therefore, the latency impact of picking a suboptimal plan in YSQL for these complex queries can be significantly larger than in vanilla postgres.

Current mitigation and workarounds

Longer-term plan is improve and tune our cost-based optimizer to ensure these queries are executed with optimal plans (see section below). Until then, since these queries are internally generated by ORMs rewriting them to be more efficient or add hints is typically not possible.

  1. If the tool supports it, configure it to use native pg_catalog tables rather then the generic information_schema views (as the former option is typically much faster).
  2. Enable batched nested loop joins (this will reduce roundtrips relative to regular nested loop). This applies to 2.20 and 2024.1 versions, older version do not support BNL joins and newer versions already have it enabled by default.
  3. Use pg_hint_table to add hints without changing the query syntax:
  4. https://docs.yugabyte.com/preview/explore/query-1-performance/pg-hint-plan/#using-the-hint-table (e.g. set enable_nestloop=false can help).
  5. If possible, particularly in multi-region clusters, run the queries against the metadata leader node. This will avoid the impact network latency as the data will be local.

Current Roadmap

Status Task Comments
Add table statistics by default to system catalogs Done in: [#15660] YSQL: ANALYZE catalog tables as part of initdb
Add internal benchmarks for system catalog queries Done internally (TODO, share this in the public repo: https://github.com/yugabyte/taqo)
🕓 Analyze and tune specific system views Use pg_hint_plan and semantically-equivalent query rewrites to achieve better default perf -- without affecting view semantics
🕓 Enable CBO by default EA in 2024.1 and latest so can be turned on manually by setting yb_enable_base_scans_cost_model to true. May require running ANALYZE manually to get table statistics (depending on the DB version)
⬜️ Add extended statistics by default for system tables For instance having column correlation statistics can improve some join row estimates leading to better plans

Original user report

Hi, open this pr to report and track the slowness/timeout issue when testing hasura + yb.

I am running yb(2.5.3.1) on a 4 node k8s cluster (20 cores and 60 gb) with the following (related) values:

resource:
  master:
    requests:
      cpu: 2
      memory: 2Gi
    limits:
      cpu: 2
      memory: 2Gi
  tserver:
    requests:
      cpu: 2
      memory: 4Gi
    limits:
      cpu: 2
      memory: 4Gi

replicas:
  master: 3
  tserver: 3

The connected database only has 1 table with 3 records, having columns of id(int), name(text).

When accessing hasura console (the UI), some queries will be executed to display the table/schema info of the database. However I consistently get failing/timeout errors at the frontend.

By analyzing networking requests sent from the frontend, i was able to locate the queries that caused the timeout. The following is one of the queries that hasura console will execute. Here is one of them:

First exec to pod:

$ kubectl exec --namespace yugabyte -it yb-tserver-0 -- /home/yugabyte/bin/ysqlsh -U internal_service -h yb-tserver-0.yb-tservers.yugabyte

Then

yugabyte=# SELECT column_name, table_name, is_generated, is_identity, identity_generation FROM information_schema.columns where table_schema = 'public';
 column_name | table_name | is_generated | is_identity | identity_generation
-------------+------------+--------------+-------------+---------------------
(0 rows)
Time: 110913.507 ms (01:50.914)

The lantency of running normal graphql queries on application tables is fine.

The timeout/slowness occurs eveytime accessing the Data tab on the hasura UI, which pretty much makes the Data tab unusable.

Link to a comment here for more ideas: https://github.com/hasura/graphql-engine/issues/3549#issuecomment-566514142

p-null commented 3 years ago

Add another query when accessing Hasura UI that times out here:

SELECT
COALESCE(
  json_agg(
    Row_to_json(functions)
  ),
  '[]' :: JSON
) from (
SELECT * FROM (
SELECT p.proname::text AS function_name,
pn.nspname::text AS function_schema,
pd.description,
    CASE
        WHEN p.provariadic = 0::oid THEN false
        ELSE true
    END AS has_variadic,
    CASE
        WHEN p.provolatile::text = 'i'::character(1)::text THEN 'IMMUTABLE'::text
        WHEN p.provolatile::text = 's'::character(1)::text THEN 'STABLE'::text
        WHEN p.provolatile::text = 'v'::character(1)::text THEN 'VOLATILE'::text
        ELSE NULL::text
    END AS function_type,
pg_get_functiondef(p.oid) AS function_definition,
rtn.nspname::text AS return_type_schema,
rt.typname::text AS return_type_name,
rt.typtype::text AS return_type_type,
p.proretset AS returns_set,
( SELECT COALESCE(json_agg(json_build_object('schema', q.schema, 'name', q.name, 'type', q.type)), '[]'::json) AS "coalesce"
       FROM ( SELECT pt.typname AS name,
                pns.nspname AS schema,
                pt.typtype AS type,
                pat.ordinality
               FROM unnest(COALESCE(p.proallargtypes, p.proargtypes::oid[])) WITH ORDINALITY pat(oid, ordinality)
                 LEFT JOIN pg_type pt ON pt.oid = pat.oid
                 LEFT JOIN pg_namespace pns ON pt.typnamespace = pns.oid
              ORDER BY pat.ordinality) q) AS input_arg_types,
to_json(COALESCE(p.proargnames, ARRAY[]::text[])) AS input_arg_names,
p.pronargdefaults AS default_args,
p.oid::integer AS function_oid
FROM pg_proc p
 JOIN pg_namespace pn ON pn.oid = p.pronamespace
JOIN pg_type rt ON rt.oid = p.prorettype
JOIN pg_namespace rtn ON rtn.oid = rt.typnamespace
LEFT JOIN pg_description pd ON p.oid = pd.objoid
WHERE
pn.nspname::text !~~ 'pg_%'::text
AND(pn.nspname::text <> ALL (ARRAY ['information_schema'::text]))
AND NOT(EXISTS (
  SELECT
    1 FROM pg_aggregate
  WHERE
    pg_aggregate.aggfnoid::oid = p.oid))) as info
WHERE function_schema=''

AND has_variadic = FALSE
AND returns_set = TRUE
AND return_type_type = 'c'

ORDER BY function_name ASC

) as functions;

I have isolated the slowness that it was not caused by network issue or Hasura app itself.

munjalpatel commented 3 years ago

We are also experimenting with Yugabyte + Hasura and facing challenges related to poor performance. Originally I thought this was a Hasura issue. However, execution of the following very simple query directly in DB takes ~30 seconds.

SELECT column_name, table_name, is_generated, is_identity, identity_generation 
    FROM information_schema.columns WHERE table_schema = 'public';

cc: @rkarthik007

ddorian commented 3 years ago

cc @m-iancu

m-iancu commented 3 years ago

It looks like the query plan is suboptimal. First, information_schema.columns is a view:

 SELECT current_database()::information_schema.sql_identifier AS table_catalog,
    nc.nspname::information_schema.sql_identifier AS table_schema,
    c.relname::information_schema.sql_identifier AS table_name,
    a.attname::information_schema.sql_identifier AS column_name,
    a.attnum::information_schema.cardinal_number AS ordinal_position,
    pg_get_expr(ad.adbin, ad.adrelid)::information_schema.character_data AS column_default,
        CASE
            WHEN a.attnotnull OR t.typtype = 'd'::"char" AND t.typnotnull THEN 'NO'::text
            ELSE 'YES'::text
        END::information_schema.yes_or_no AS is_nullable,
        CASE
            WHEN t.typtype = 'd'::"char" THEN
            CASE
                WHEN bt.typelem <> 0::oid AND bt.typlen = '-1'::integer THEN 'ARRAY'::text
                WHEN nbt.nspname = 'pg_catalog'::name THEN format_type(t.typbasetype, NULL::integer)
                ELSE 'USER-DEFINED'::text
            END
            ELSE
            CASE
                WHEN t.typelem <> 0::oid AND t.typlen = '-1'::integer THEN 'ARRAY'::text
                WHEN nt.nspname = 'pg_catalog'::name THEN format_type(a.atttypid, NULL::integer)
                ELSE 'USER-DEFINED'::text
            END
        END::information_schema.character_data AS data_type,
    information_schema._pg_char_max_length(information_schema._pg_truetypid(a.*, t.*), information_schema._pg_truetypmod(a.*, t.*))::information_schema.cardinal_number AS character_maximum_length,
    information_schema._pg_char_octet_length(information_schema._pg_truetypid(a.*, t.*), information_schema._pg_truetypmod(a.*, t.*))::information_schema.cardinal_number AS character_octet_length,
    information_schema._pg_numeric_precision(information_schema._pg_truetypid(a.*, t.*), information_schema._pg_truetypmod(a.*, t.*))::information_schema.cardinal_number AS numeric_precision,
    information_schema._pg_numeric_precision_radix(information_schema._pg_truetypid(a.*, t.*), information_schema._pg_truetypmod(a.*, t.*))::information_schema.cardinal_number AS numeric_precision_radix,
    information_schema._pg_numeric_scale(information_schema._pg_truetypid(a.*, t.*), information_schema._pg_truetypmod(a.*, t.*))::information_schema.cardinal_number AS numeric_scale,
    information_schema._pg_datetime_precision(information_schema._pg_truetypid(a.*, t.*), information_schema._pg_truetypmod(a.*, t.*))::information_schema.cardinal_number AS datetime_precision,
    information_schema._pg_interval_type(information_schema._pg_truetypid(a.*, t.*), information_schema._pg_truetypmod(a.*, t.*))::information_schema.character_data AS interval_type,
    NULL::integer::information_schema.cardinal_number AS interval_precision,
    NULL::character varying::information_schema.sql_identifier AS character_set_catalog,
    NULL::character varying::information_schema.sql_identifier AS character_set_schema,
    NULL::character varying::information_schema.sql_identifier AS character_set_name,
        CASE
            WHEN nco.nspname IS NOT NULL THEN current_database()
            ELSE NULL::name
        END::information_schema.sql_identifier AS collation_catalog,
    nco.nspname::information_schema.sql_identifier AS collation_schema,
    co.collname::information_schema.sql_identifier AS collation_name,
        CASE
            WHEN t.typtype = 'd'::"char" THEN current_database()
            ELSE NULL::name
        END::information_schema.sql_identifier AS domain_catalog,
        CASE
            WHEN t.typtype = 'd'::"char" THEN nt.nspname
            ELSE NULL::name
        END::information_schema.sql_identifier AS domain_schema,
        CASE
            WHEN t.typtype = 'd'::"char" THEN t.typname
            ELSE NULL::name
        END::information_schema.sql_identifier AS domain_name,
    current_database()::information_schema.sql_identifier AS udt_catalog,
    COALESCE(nbt.nspname, nt.nspname)::information_schema.sql_identifier AS udt_schema,
    COALESCE(bt.typname, t.typname)::information_schema.sql_identifier AS udt_name,
    NULL::character varying::information_schema.sql_identifier AS scope_catalog,
    NULL::character varying::information_schema.sql_identifier AS scope_schema,
    NULL::character varying::information_schema.sql_identifier AS scope_name,
    NULL::integer::information_schema.cardinal_number AS maximum_cardinality,
    a.attnum::information_schema.sql_identifier AS dtd_identifier,
    'NO'::character varying::information_schema.yes_or_no AS is_self_referencing,
        CASE
            WHEN a.attidentity = ANY (ARRAY['a'::"char", 'd'::"char"]) THEN 'YES'::text
            ELSE 'NO'::text
        END::information_schema.yes_or_no AS is_identity,
        CASE a.attidentity
            WHEN 'a'::"char" THEN 'ALWAYS'::text
            WHEN 'd'::"char" THEN 'BY DEFAULT'::text
            ELSE NULL::text
        END::information_schema.character_data AS identity_generation,
    seq.seqstart::information_schema.character_data AS identity_start,
    seq.seqincrement::information_schema.character_data AS identity_increment,
    seq.seqmax::information_schema.character_data AS identity_maximum,
    seq.seqmin::information_schema.character_data AS identity_minimum,
        CASE
            WHEN seq.seqcycle THEN 'YES'::text
            ELSE 'NO'::text
        END::information_schema.yes_or_no AS identity_cycle,
    'NEVER'::character varying::information_schema.character_data AS is_generated,
    NULL::character varying::information_schema.character_data AS generation_expression,
        CASE
            WHEN (c.relkind = ANY (ARRAY['r'::"char", 'p'::"char"])) OR (c.relkind = ANY (ARRAY['v'::"char", 'f'::"char"])) AND pg_column_is_updatable(c.oid::regclass, a.attnum, false) THEN 'YES'::text
            ELSE 'NO'::text
        END::information_schema.yes_or_no AS is_updatable
   FROM pg_attribute a
     LEFT JOIN pg_attrdef ad ON a.attrelid = ad.adrelid AND a.attnum = ad.adnum
     JOIN (pg_class c
     JOIN pg_namespace nc ON c.relnamespace = nc.oid) ON a.attrelid = c.oid
     JOIN (pg_type t
     JOIN pg_namespace nt ON t.typnamespace = nt.oid) ON a.atttypid = t.oid
     LEFT JOIN (pg_type bt
     JOIN pg_namespace nbt ON bt.typnamespace = nbt.oid) ON t.typtype = 'd'::"char" AND t.typbasetype = bt.oid
     LEFT JOIN (pg_collation co
     JOIN pg_namespace nco ON co.collnamespace = nco.oid) ON a.attcollation = co.oid AND (nco.nspname <> 'pg_catalog'::name OR co.collname <> 'default'::name)
     LEFT JOIN (pg_depend dep
     JOIN pg_sequence seq ON dep.classid = 'pg_class'::regclass::oid AND dep.objid = seq.seqrelid AND dep.deptype = 'i'::"char") ON dep.refclassid = 'pg_class'::regclass::oid AND dep.refobjid = c.oid AND dep.refobjsubid = a.attnum
  WHERE NOT pg_is_other_temp_schema(nc.oid) AND a.attnum > 0 AND NOT a.attisdropped AND (c.relkind = ANY (ARRAY['r'::"char", 'v'::"char", 'f'::"char", 'p'::"char"])) AND (pg_has_role(c.relowner, 'USAGE'::text) OR has_column_privilege(c.oid, a.attnum, 'SELECT, INSERT, UPDATE, REFERENCES'::text));

So even if the query syntax is simple, internally the query is fairly complex. The default plan I get locally is:

 Nested Loop Left Join  (cost=69.36..120.09 rows=56 width=160) (actual time=1779.718..1779.718 rows=0 loops=1)
   ->  Hash Right Join  (cost=69.36..102.37 rows=56 width=133) (actual time=1779.718..1779.718 rows=0 loops=1)
         Hash Cond: ((dep.refobjid = c.oid) AND (dep.refobjsubid = a.attnum))
         ->  Nested Loop  (cost=0.00..32.25 rows=100 width=8) (never executed)
               ->  Index Scan using pg_depend_reference_index on pg_depend dep  (cost=0.00..17.25 rows=100 width=12) (never executed)
                     Index Cond: (refclassid = '1259'::oid)
                     Filter: ((classid = '1259'::oid) AND (deptype = 'i'::"char"))
               ->  Index Scan using pg_sequence_seqrelid_index on pg_sequence seq  (cost=0.00..0.15 rows=1 width=4) (never executed)
                     Index Cond: (seqrelid = dep.objid)
         ->  Hash  (cost=68.52..68.52 rows=56 width=139) (actual time=1779.713..1779.713 rows=0 loops=1)
               Buckets: 1024  Batches: 1  Memory Usage: 8kB
               ->  Nested Loop  (cost=0.00..68.52 rows=56 width=139) (actual time=1779.713..1779.713 rows=0 loops=1)
                     ->  Nested Loop  (cost=0.00..62.14 rows=56 width=143) (actual time=1779.712..1779.712 rows=0 loops=1)
                           ->  Nested Loop Left Join  (cost=0.00..54.63 rows=56 width=147) (actual time=32.576..1402.041 rows=1767 loops=1)
                                 Join Filter: (t.typtype = 'd'::"char")
                                 ->  Nested Loop  (cost=0.00..41.15 rows=56 width=152) (actual time=32.334..891.808 rows=1767 loops=1)
                                       ->  Nested Loop  (cost=0.00..32.75 rows=56 width=147) (actual time=31.998..489.840 rows=1767 loops=1)
                                             ->  Index Scan using pg_attribute_relid_attnum_index on pg_attribute a  (cost=0.00..15.25 rows=100 width=79) (actual time=29.254..34.870 rows=1962 loops=1)
                                                   Index Cond: (attnum > 0)
                                                   Filter: (NOT attisdropped)
                                             ->  Index Scan using pg_class_oid_index on pg_class c  (cost=0.00..0.17 rows=1 width=76) (actual time=0.214..0.214 rows=1 loops=1962)
                                                   Index Cond: (oid = a.attrelid)
                                                   Filter: ((relkind = ANY ('{r,v,f,p}'::"char"[])) AND (pg_has_role(relowner, 'USAGE'::text) OR has_column_privilege(oid, a.attnum, 'SELECT, INSERT, UPDATE, REFERENCES'::text)))
                                                   Rows Removed by Filter: 0
                                       ->  Index Scan using pg_type_oid_index on pg_type t  (cost=0.00..0.15 rows=1 width=13) (actual time=0.211..0.211 rows=1 loops=1767)
                                             Index Cond: (oid = a.atttypid)
                                 ->  Nested Loop  (cost=0.00..0.23 rows=1 width=4) (actual time=0.287..0.288 rows=0 loops=1767)
                                       ->  Index Scan using pg_type_oid_index on pg_type bt  (cost=0.00..0.11 rows=1 width=8) (actual time=0.196..0.196 rows=0 loops=1767)
                                             Index Cond: (t.typbasetype = oid)
                                       ->  Index Scan using pg_namespace_oid_index on pg_namespace nbt  (cost=0.00..0.11 rows=1 width=4) (actual time=0.194..0.194 rows=1 loops=661)
                                             Index Cond: (oid = bt.typnamespace)
                           ->  Index Scan using pg_namespace_oid_index on pg_namespace nc  (cost=0.00..0.13 rows=1 width=4) (actual time=0.204..0.204 rows=0 loops=1767)
                                 Index Cond: (oid = c.relnamespace)
                                 Filter: ((NOT pg_is_other_temp_schema(oid)) AND (((nspname)::information_schema.sql_identifier)::text = 'public'::text))
                                 Rows Removed by Filter: 1
                     ->  Index Scan using pg_namespace_oid_index on pg_namespace nt  (cost=0.00..0.11 rows=1 width=4) (never executed)
                           Index Cond: (oid = t.typnamespace)
   ->  Nested Loop  (cost=0.00..0.28 rows=1 width=4) (never executed)
         ->  Index Scan using pg_collation_oid_index on pg_collation co  (cost=0.00..0.15 rows=1 width=72) (never executed)
               Index Cond: (a.attcollation = oid)
         ->  Index Scan using pg_namespace_oid_index on pg_namespace nco  (cost=0.00..0.12 rows=1 width=68) (never executed)
               Index Cond: (oid = co.collnamespace)
               Filter: ((nspname <> 'pg_catalog'::name) OR (co.collname <> 'default'::name))
 Planning Time: 0.949 ms
 Execution Time: 1805.072 ms
(45 rows)

I didn't try to tune this (e.g. with pg_hint_plan), but just noticed there is one bad nested loop join, so I just disabled nestloop joins (with set enable_nestloop = false) entirely to check the updated explain analyze output:

 Hash Left Join  (cost=948.04..1064.47 rows=56 width=160) (actual time=67.859..67.859 rows=0 loops=1)
   Hash Cond: ((c.oid = dep.refobjid) AND (a.attnum = dep.refobjsubid))
   ->  Hash Right Join  (cost=816.53..929.73 rows=56 width=135) (actual time=67.858..67.858 rows=0 loops=1)
         Hash Cond: (co.oid = a.attcollation)
         ->  Hash Join  (cost=112.50..215.14 rows=1000 width=4) (never executed)
               Hash Cond: (co.collnamespace = nco.oid)
               Join Filter: ((nco.nspname <> 'pg_catalog'::name) OR (co.collname <> 'default'::name))
               ->  Seq Scan on pg_collation co  (cost=0.00..100.00 rows=1000 width=72) (never executed)
               ->  Hash  (cost=100.00..100.00 rows=1000 width=68) (never executed)
                     ->  Seq Scan on pg_namespace nco  (cost=0.00..100.00 rows=1000 width=68) (never executed)
         ->  Hash  (cost=703.33..703.33 rows=56 width=139) (actual time=67.854..67.855 rows=0 loops=1)
               Buckets: 1024  Batches: 1  Memory Usage: 8kB
               ->  Hash Join  (cost=582.77..703.33 rows=56 width=139) (actual time=67.854..67.854 rows=0 loops=1)
                     Hash Cond: (nc.oid = c.relnamespace)
                     ->  Seq Scan on pg_namespace nc  (cost=0.00..110.00 rows=1000 width=4) (actual time=0.997..1.011 rows=1 loops=1)
                           Filter: ((NOT pg_is_other_temp_schema(oid)) AND (((nspname)::information_schema.sql_identifier)::text = 'public'::text))
                           Rows Removed by Filter: 3
                     ->  Hash  (cost=582.07..582.07 rows=56 width=143) (actual time=66.838..66.838 rows=1767 loops=1)
                           Buckets: 2048 (originally 1024)  Batches: 1 (originally 1)  Memory Usage: 320kB
                           ->  Hash Right Join  (cost=468.73..582.07 rows=56 width=143) (actual time=65.962..66.487 rows=1767 loops=1)
                                 Hash Cond: (bt.oid = t.typbasetype)
                                 Join Filter: (t.typtype = 'd'::"char")
                                 ->  Hash Join  (cost=112.50..215.14 rows=1000 width=4) (actual time=2.549..2.762 rows=366 loops=1)
                                       Hash Cond: (bt.typnamespace = nbt.oid)
                                       ->  Seq Scan on pg_type bt  (cost=0.00..100.00 rows=1000 width=8) (actual time=2.188..2.337 rows=366 loops=1)
                                       ->  Hash  (cost=100.00..100.00 rows=1000 width=4) (actual time=0.353..0.353 rows=4 loops=1)
                                             Buckets: 1024  Batches: 1  Memory Usage: 9kB
                                             ->  Seq Scan on pg_namespace nbt  (cost=0.00..100.00 rows=1000 width=4) (actual time=0.348..0.350 rows=4 loops=1)
                                 ->  Hash  (cost=355.53..355.53 rows=56 width=148) (actual time=63.402..63.402 rows=1767 loops=1)
                                       Buckets: 2048 (originally 1024)  Batches: 1 (originally 1)  Memory Usage: 334kB
                                       ->  Hash Join  (cost=244.97..355.53 rows=56 width=148) (actual time=62.799..63.061 rows=1767 loops=1)
                                             Hash Cond: (nt.oid = t.typnamespace)
                                             ->  Seq Scan on pg_namespace nt  (cost=0.00..100.00 rows=1000 width=4) (actual time=0.258..0.260 rows=4 loops=1)
                                             ->  Hash  (cost=244.27..244.27 rows=56 width=152) (actual time=62.536..62.536 rows=1767 loops=1)
                                                   Buckets: 2048 (originally 1024)  Batches: 1 (originally 1)  Memory Usage: 341kB
                                                   ->  Hash Join  (cost=133.71..244.27 rows=56 width=152) (actual time=61.691..62.170 rows=1767 loops=1)
                                                         Hash Cond: (t.oid = a.atttypid)
                                                         ->  Seq Scan on pg_type t  (cost=0.00..100.00 rows=1000 width=13) (actual time=4.357..4.528 rows=366 loops=1)
                                                         ->  Hash  (cost=133.01..133.01 rows=56 width=147) (actual time=57.325..57.325 rows=1767 loops=1)
                                                               Buckets: 2048 (originally 1024)  Batches: 1 (originally 1)  Memory Usage: 327kB
                                                               ->  Hash Join  (cost=117.50..133.01 rows=56 width=147) (actual time=28.067..56.909 rows=1767 loops=1)
                                                                     Hash Cond: (a.attrelid = c.oid)
                                                                     Join Filter: (pg_has_role(c.relowner, 'USAGE'::text) OR has_column_privilege(c.oid, a.attnum, 'SELECT, INSERT, UPDATE, REFERENCES'::text))
                                                                     ->  Index Scan using pg_attribute_relid_attnum_index on pg_attribute a  (cost=0.00..15.25 rows=100 width=79) (actual time=24.230..52.472 rows=1962 loops=1)
                                                                           Index Cond: (attnum > 0)
                                                                           Filter: (NOT attisdropped)
                                                                     ->  Hash  (cost=105.00..105.00 rows=1000 width=76) (actual time=3.806..3.806 rows=191 loops=1)
                                                                           Buckets: 1024  Batches: 1  Memory Usage: 29kB
                                                                           ->  Seq Scan on pg_class c  (cost=0.00..105.00 rows=1000 width=76) (actual time=3.613..3.772 rows=191 loops=1)
                                                                                 Filter: (relkind = ANY ('{r,v,f,p}'::"char"[]))
                                                                                 Rows Removed by Filter: 118
   ->  Hash  (cost=130.01..130.01 rows=100 width=8) (never executed)
         ->  Hash Join  (cost=112.50..130.01 rows=100 width=8) (never executed)
               Hash Cond: (dep.objid = seq.seqrelid)
               ->  Index Scan using pg_depend_reference_index on pg_depend dep  (cost=0.00..17.25 rows=100 width=12) (never executed)
                     Index Cond: (refclassid = '1259'::oid)
                     Filter: ((classid = '1259'::oid) AND (deptype = 'i'::"char"))
               ->  Hash  (cost=100.00..100.00 rows=1000 width=4) (never executed)
                     ->  Seq Scan on pg_sequence seq  (cost=0.00..100.00 rows=1000 width=4) (never executed)
 Planning Time: 0.814 ms
 Execution Time: 68.752 ms

So, locally, execution time goes down from 1805.072 ms --> 68.752 ms. However, we might be able to do even better, the latter is just with the default plan when nested loop joins are disabled (rather than a finely tuned plan). We should investigate further what is needed to pick the correct plan.

FranckPachot commented 2 years ago

I suggest "patching" the catalog view definition to materialize as a CTE the tables where we don't want an index access. For example,

explain analyze SELECT column_name, table_name, is_generated, is_identity, identity_generation FROM information_schema.columns where table_schema = 'public';

takes 15 milliseconds when information_schema.columns is defined as, instead of 3 seconds

with pg_class as (select oid,* from pg_class)
   , pg_namespace as(select oid,* from pg_namespace)
SELECT pg_class.oid, 
...

I've "patched the view with this:

-- BE CAREFUL THIS REWRITES THE CATALOG VIEW AND IS JUST EXPERIMENTAL
do $$
 declare i record; 
 begin for i in (
  select format('
create or replace view %I.%I as 
with pg_class as (select oid,* from pg_class)
   , pg_namespace as(select oid,* from pg_namespace)
%s'
   ,schemaname,viewname,definition
  ) ddl from pg_views 
  where schemaname='information_schema' and viewname='columns' 
  and definition not like '%WITH pg_class AS%'
 ) loop execute i.ddl; end loop; 
end; $$
;

@m-iancu What do you think?

maaft commented 2 years ago

hmm, thats bad. Yugabyte is not usable for us then. Why is this only medium priority? I'd suggest to bump that up, or CockroachDB will frontrun you with their incoming hasura integration..

munjalpatel commented 1 year ago

@m-iancu are there any plans for fixing this? This is blocking our move to Yugabyte.

image
FranckPachot commented 1 year ago

@munjalpatel, @maaft which version are you using? Those queries should be faster with the new optimization for Nested Loop joins that can be enabled with: set yb_bnl_batch_size=100; In previous versions, the workaround is to disable Nested Loops for those queries (set enable_nestloop=off;). This is faster but may have side effects on other queries. The batched nested loop should be ok for all queries, but it is in preview only right now.

munjalpatel commented 1 year ago

@FranckPachot v2.15.3 -- created a test cluster on Yugabyte Cloud today.

image
ddorian commented 1 year ago

@munjalpatel

can you explain your use case in detail?

munjalpatel commented 1 year ago

@ddorian the usecase is to use Yugabyte with Hasura. On Hasura console, almost every query to Yugabyte takes ~10 seconds.

mislam-yb commented 1 year ago

@p-null we'd like to analyze the query plan Yugabyte uses here. Could you perform a db dump (could be some minimal db that reproduces the issue) and share it? Here are docs for running a db dump: https://docs.yugabyte.com/preview/admin/ysql-dump/. Let me know if you need any help.

morigs commented 1 year ago

We experience similar issues with gorm automigrate. Queries to information_schema take tens of seconds on empty one node database

ddorian commented 1 year ago

@morigs

can you upload the schema + the queries that you're making?

morigs commented 1 year ago

@ddorian I can send schema to email if this matters, but IMO it doesn't. We have pretty simple schema, just 66 small tables and a couple of indexes. I attach two queries below. The first one is the one generated by gorm. In our code it took 15s and in IDE it took around 10s. The seconds query is a plain select all from columns table. Execution plans are in spoilers (durations smaller than I mentioned because I executed these queries multiple times before running explain analyse) I know nothing about internals, however, execution plan for simple select * looks far too complex to me 🥲 Shouldn't this be a materialized view or something?


First Query ```sql SELECT c.column_name, c.is_nullable = 'YES', c.udt_name, c.character_maximum_length, c.numeric_precision, c.numeric_precision_radix, c.numeric_scale, c.datetime_precision, 8 * typlen, c.column_default, pd.description, c.identity_increment FROM information_schema.columns AS c JOIN pg_type AS pgt ON c.udt_name = pgt.typname LEFT JOIN pg_catalog.pg_description as pd ON pd.objsubid = c.ordinal_position AND pd.objoid = ( SELECT oid FROM pg_catalog.pg_class WHERE relname = c.table_name AND relnamespace = ( SELECT oid FROM pg_catalog.pg_namespace WHERE nspname = c.table_schema ) ) where table_catalog = 'yugabyte' AND table_schema = 'clusters' AND table_name = 'clusters' ```
First Query Plan ``` Result (cost=146.26..2051.58 rows=280 width=185) (actual time=5177.066..5936.237 rows=10 loops=1) One-Time Filter: (((current_database())::information_schema.sql_identifier)::text = 'yugabyte'::text) -> Hash Join (cost=146.26..988.28 rows=280 width=793) (actual time=5163.893..5912.644 rows=10 loops=1) " Hash Cond: (((COALESCE(bt.typname, t.typname))::information_schema.sql_identifier)::name = pgt.typname)" -> Nested Loop Left Join (cost=33.75..844.83 rows=56 width=791) (actual time=5108.748..5857.148 rows=10 loops=1) -> Hash Left Join (cost=33.75..828.64 rows=56 width=795) (actual time=5104.737..5814.410 rows=10 loops=1) Hash Cond: ((c.oid = dep.refobjid) AND (a.attnum = dep.refobjsubid)) -> Nested Loop Left Join (cost=0.00..793.20 rows=56 width=793) (actual time=4827.277..5536.765 rows=10 loops=1) Join Filter: (pd.objsubid = ((a.attnum)::information_schema.cardinal_number)::integer) -> Nested Loop (cost=0.00..78.98 rows=56 width=889) (actual time=4820.218..5457.055 rows=10 loops=1) -> Nested Loop (cost=0.00..72.60 rows=56 width=893) (actual time=4818.699..5436.295 rows=10 loops=1) -> Nested Loop Left Join (cost=0.00..65.09 rows=56 width=833) (actual time=4816.771..5406.399 rows=14 loops=1) -> Nested Loop Left Join (cost=0.00..56.13 rows=56 width=801) (actual time=4814.542..5376.850 rows=14 loops=1) " Join Filter: (t.typtype = 'd'::""char"")" -> Nested Loop (cost=0.00..42.65 rows=56 width=741) (actual time=4812.763..5347.585 rows=14 loops=1) -> Nested Loop (cost=0.00..34.25 rows=56 width=412) (actual time=4809.751..5295.125 rows=14 loops=1) -> Index Scan using pg_attribute_relid_attnum_index on pg_attribute a (cost=0.00..15.25 rows=100 width=340) (actual time=398.426..487.754 rows=2348 loops=1) Index Cond: (attnum > 0) Filter: (NOT attisdropped) -> Index Scan using pg_class_oid_index on pg_class c (cost=0.00..0.18 rows=1 width=76) (actual time=1.875..1.875 rows=0 loops=2348) Index Cond: (oid = a.attrelid) " Filter: ((relkind = ANY ('{r,v,f,p}'::""char""[])) AND (pg_has_role(relowner, 'USAGE'::text) OR has_column_privilege(oid, a.attnum, 'SELECT, INSERT, UPDATE, REFERENCES'::text)) AND (((relname)::information_schema.sql_identifier)::text = 'clusters'::text))" Rows Removed by Filter: 1 -> Index Scan using pg_type_oid_index on pg_type t (cost=0.00..0.15 rows=1 width=337) (actual time=3.365..3.365 rows=1 loops=14) Index Cond: (oid = a.atttypid) -> Nested Loop (cost=0.00..0.23 rows=1 width=68) (actual time=2.057..2.057 rows=0 loops=14) -> Index Scan using pg_type_oid_index on pg_type bt (cost=0.00..0.11 rows=1 width=72) (actual time=1.878..1.878 rows=0 loops=14) Index Cond: (t.typbasetype = oid) -> Index Scan using pg_namespace_oid_index on pg_namespace nbt (cost=0.00..0.11 rows=1 width=4) (never executed) Index Cond: (oid = bt.typnamespace) -> Index Scan using pg_attrdef_adrelid_adnum_index on pg_attrdef ad (cost=0.00..0.16 rows=1 width=38) (actual time=1.872..1.872 rows=0 loops=14) Index Cond: ((a.attrelid = adrelid) AND (a.attnum = adnum)) -> Index Scan using pg_namespace_oid_index on pg_namespace nc (cost=0.00..0.13 rows=1 width=68) (actual time=1.986..1.986 rows=1 loops=14) Index Cond: (oid = c.relnamespace) Filter: ((NOT pg_is_other_temp_schema(oid)) AND (((nspname)::information_schema.sql_identifier)::text = 'clusters'::text)) Rows Removed by Filter: 0 -> Index Scan using pg_namespace_oid_index on pg_namespace nt (cost=0.00..0.11 rows=1 width=4) (actual time=1.921..1.921 rows=1 loops=10) Index Cond: (oid = t.typnamespace) -> Index Scan using pg_description_o_c_o_index on pg_description pd (cost=0.00..11.00 rows=100 width=40) (actual time=1.866..1.866 rows=0 loops=10) Index Cond: (objoid = (SubPlan 2)) SubPlan 2 -> Index Scan using pg_class_relname_nsp_index on pg_class (cost=4.13..8.26 rows=1 width=4) (actual time=2.996..3.011 rows=1 loops=10) Index Cond: ((relname = ((c.relname)::information_schema.sql_identifier)::name) AND (relnamespace = $1)) InitPlan 1 (returns $1) -> Index Scan using pg_namespace_nspname_index on pg_namespace (cost=0.00..4.13 rows=1 width=4) (actual time=2.446..2.461 rows=1 loops=10) Index Cond: (nspname = ((nc.nspname)::information_schema.sql_identifier)::name) -> Hash (cost=32.25..32.25 rows=100 width=16) (actual time=276.695..276.697 rows=0 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 8kB -> Nested Loop (cost=0.00..32.25 rows=100 width=16) (actual time=276.678..276.678 rows=0 loops=1) -> Index Scan using pg_depend_reference_index on pg_depend dep (cost=0.00..17.25 rows=100 width=12) (actual time=276.655..276.655 rows=0 loops=1) Index Cond: (refclassid = '1259'::oid) " Filter: ((classid = '1259'::oid) AND (deptype = 'i'::""char""))" Rows Removed by Filter: 1147 -> Index Scan using pg_sequence_seqrelid_index on pg_sequence seq (cost=0.00..0.15 rows=1 width=12) (never executed) Index Cond: (seqrelid = dep.objid) -> Nested Loop (cost=0.00..0.28 rows=1 width=4) (actual time=4.249..4.249 rows=0 loops=10) -> Index Scan using pg_collation_oid_index on pg_collation co (cost=0.00..0.15 rows=1 width=72) (actual time=2.971..2.974 rows=0 loops=10) Index Cond: (a.attcollation = oid) -> Index Scan using pg_namespace_oid_index on pg_namespace nco (cost=0.00..0.12 rows=1 width=68) (actual time=2.656..2.657 rows=0 loops=4) Index Cond: (oid = co.collnamespace) Filter: ((nspname <> 'pg_catalog'::name) OR (co.collname <> 'default'::name)) Rows Removed by Filter: 1 -> Hash (cost=100.00..100.00 rows=1000 width=66) (actual time=54.841..54.841 rows=524 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 59kB -> Seq Scan on pg_type pgt (cost=0.00..100.00 rows=1000 width=66) (actual time=51.925..53.711 rows=524 loops=1) Planning Time: 55.982 ms Execution Time: 5947.948 ms ```

Second Query:

select * from information_schema.columns limit 500;
Second Query Plan ``` Limit (cost=33.75..436.96 rows=56 width=1156) (actual time=733.335..8745.432 rows=500 loops=1) -> Nested Loop Left Join (cost=33.75..436.96 rows=56 width=1156) (actual time=733.311..8743.031 rows=500 loops=1) -> Hash Left Join (cost=33.75..112.08 rows=56 width=1072) (actual time=719.655..7352.614 rows=500 loops=1) Hash Cond: ((c.oid = dep.refobjid) AND (a.attnum = dep.refobjsubid)) -> Nested Loop (cost=0.00..76.64 rows=56 width=1039) (actual time=427.023..7052.697 rows=500 loops=1) -> Nested Loop (cost=0.00..70.26 rows=56 width=979) (actual time=425.412..6107.078 rows=500 loops=1) -> Nested Loop Left Join (cost=0.00..63.59 rows=56 width=919) (actual time=423.509..5113.732 rows=500 loops=1) -> Nested Loop Left Join (cost=0.00..54.63 rows=56 width=887) (actual time=421.539..4149.515 rows=500 loops=1) " Join Filter: (t.typtype = 'd'::""char"")" -> Nested Loop (cost=0.00..41.15 rows=56 width=753) (actual time=420.218..3257.687 rows=500 loops=1) -> Nested Loop (cost=0.00..32.75 rows=56 width=414) (actual time=416.911..1880.248 rows=500 loops=1) -> Index Scan using pg_attribute_relid_attnum_index on pg_attribute a (cost=0.00..15.25 rows=100 width=341) (actual time=387.229..426.516 rows=681 loops=1) Index Cond: (attnum > 0) Filter: (NOT attisdropped) -> Index Scan using pg_class_oid_index on pg_class c (cost=0.00..0.17 rows=1 width=77) (actual time=1.943..1.943 rows=1 loops=681) Index Cond: (oid = a.attrelid) " Filter: ((relkind = ANY ('{r,v,f,p}'::""char""[])) AND (pg_has_role(relowner, 'USAGE'::text) OR has_column_privilege(oid, a.attnum, 'SELECT, INSERT, UPDATE, REFERENCES'::text)))" Rows Removed by Filter: 0 -> Index Scan using pg_type_oid_index on pg_type t (cost=0.00..0.15 rows=1 width=343) (actual time=2.477..2.477 rows=1 loops=500) Index Cond: (oid = a.atttypid) -> Nested Loop (cost=0.00..0.23 rows=1 width=138) (actual time=1.765..1.765 rows=0 loops=500) -> Index Scan using pg_type_oid_index on pg_type bt (cost=0.00..0.11 rows=1 width=78) (actual time=1.561..1.561 rows=0 loops=500) Index Cond: (t.typbasetype = oid) -> Index Scan using pg_namespace_oid_index on pg_namespace nbt (cost=0.00..0.11 rows=1 width=68) (never executed) Index Cond: (oid = bt.typnamespace) -> Index Scan using pg_attrdef_adrelid_adnum_index on pg_attrdef ad (cost=0.00..0.16 rows=1 width=38) (actual time=1.738..1.738 rows=0 loops=500) Index Cond: ((a.attrelid = adrelid) AND (a.attnum = adnum)) -> Index Scan using pg_namespace_oid_index on pg_namespace nc (cost=0.00..0.12 rows=1 width=68) (actual time=1.847..1.847 rows=1 loops=500) Index Cond: (oid = c.relnamespace) Filter: (NOT pg_is_other_temp_schema(oid)) -> Index Scan using pg_namespace_oid_index on pg_namespace nt (cost=0.00..0.11 rows=1 width=68) (actual time=1.757..1.757 rows=1 loops=500) Index Cond: (oid = t.typnamespace) -> Hash (cost=32.25..32.25 rows=100 width=41) (actual time=292.461..292.463 rows=0 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 8kB -> Nested Loop (cost=0.00..32.25 rows=100 width=41) (actual time=292.454..292.458 rows=0 loops=1) -> Index Scan using pg_depend_reference_index on pg_depend dep (cost=0.00..17.25 rows=100 width=12) (actual time=292.449..292.454 rows=0 loops=1) Index Cond: (refclassid = '1259'::oid) " Filter: ((classid = '1259'::oid) AND (deptype = 'i'::""char""))" Rows Removed by Filter: 1147 -> Index Scan using pg_sequence_seqrelid_index on pg_sequence seq (cost=0.00..0.15 rows=1 width=37) (never executed) Index Cond: (seqrelid = dep.objid) -> Nested Loop (cost=0.00..0.28 rows=1 width=132) (actual time=1.888..1.888 rows=0 loops=500) -> Index Scan using pg_collation_oid_index on pg_collation co (cost=0.00..0.15 rows=1 width=72) (actual time=1.559..1.560 rows=0 loops=500) Index Cond: (a.attcollation = oid) -> Index Scan using pg_namespace_oid_index on pg_namespace nco (cost=0.00..0.12 rows=1 width=68) (actual time=1.714..1.714 rows=0 loops=51) Index Cond: (oid = co.collnamespace) Filter: ((nspname <> 'pg_catalog'::name) OR (co.collname <> 'default'::name)) Rows Removed by Filter: 1 Planning Time: 58.967 ms Execution Time: 8995.813 ms ```
morigs commented 1 year ago

Sorry for disturbing, but I'm interested on what are plans for this? From what I see, this a common issue, it blocks people from migrating to Yugabyte, but 2 years have passed and it sill not fixed.

Is there any ETA? Can community somehow help with this?

mislam-yb commented 1 year ago

Hey, thanks for checking in. We did a thorough investigation on system catalog queries recently, and the upshot is there are some issues with CBO currently that may be preventing us from optimizing these queries as much as we could be (see this, this, and this). For example just enabling CBO for select * from information_schema.columns limit 500; makes the query time jump to 300s. Eventually CBO and BNL will go a long way in improving query performance.

For now, setting BNL batch size to 1000 (without CBO) does reduce the query time by ~40% for me. Not sure if you tried that already but it could be useful for you.

FranckPachot commented 1 year ago

@morigs thanks for reporting. There are general optimizations ongoing for complex queries and many queries on the catalog benefit from Batched Nested Loop (set yb_bnl_batch_size=1000) but here there is still one Nested Loop is not batched /cc @tanujnay112

                                       ->  YB Batched Nested Loop Join (actual time=551.978..552.602 rows=1000 loops=1)
                                             Join Filter: (a.atttypid = t.oid)
                   👉                       ->  Nested Loop (actual time=11.052..546.034 rows=1000 loops=1)
                                                   ->  Index Scan using pg_attribute_relid_attnum_index on pg_attribute a (actualtime=7.069..25.451 rows=1198 loops=1)
                                                         Index Cond: (attnum > 0)
                                                         Remote Filter: (NOT attisdropped)
                                                         Storage Index Read Requests: 2
                                                         Storage Index Execution Time: 24.000 ms
                                                   ->  Index Scan using pg_class_oid_index on pg_class c (actual time=0.419..0.419rows=1 loops=1198)
                                                         Index Cond: (oid = a.attrelid)
                                                         Filter: ((relkind = ANY ('{r,v,f,p}'::"char"[])) AND (pg_has_role(relowner, 'USAGE'::text) OR has_column_privilege(oid, a.attnum, 'SELECT, INSERT, UPDATE, REFERENCES'::text)))
                                                         Rows Removed by Filter: 0
                                                         Storage Index Read Requests: 1
                                                         Storage Index Execution Time: 0.394 ms
                                             ->  Index Scan using pg_type_oid_index on pg_type t (actual time=1.587..1.654 rows=32 loops=1)
                                                   Index Cond: (oid = ANY (ARRAY[a.atttypid, $3, $4, ..., $1001]))
                                                   Storage Index Read Requests: 1
                                                   Storage Index Execution Time: 0.000 ms
morigs commented 1 year ago

Thanks for responding. Happy to hear that there is ongoing work on this and this is not lost :)

If I understand you correctly, setting set yb_bnl_batch_size=1000 should be a workaround. I'm not sure about previously mentioned use cases (e.g. Hasura), but we use gorm.AutoMigrate. It takes connections from the Go's sql.DB pool and then discovers the current schema. I don't know a way to execute SET on every connection in go. I've checked the docs, it doesn't mention this.

So, is it possible to set this in other way (perhaps using configuration file)? Is it safe to set this globally?

ddorian commented 1 year ago

@morigs

I'll ask internally if it can be set as a gflag. It can be set as a runtime flag, which was just committed https://github.com/yugabyte/yugabyte-db/commit/a5ee779b2269bbacdcbb53552d9b1855b732cfe7

It takes connections from the Go's sql.DB pool and then discovers the current schema. I don't know a way to execute SET on every connection in go. I've checked the docs, it doesn't mention this.

You should be able to "hook" into the "getting connection" cycle and execute a query before anything happens. Either in the ORM or in the connection-pool or somewhere in-between.

FranckPachot commented 1 year ago

yb_bnl_batch_size is a YSQL (the postgres backend) parameter. Can be set at many level: cluster (--ysql_pg_conf_csv=...), database (alter database ... set) , user (alter user ... set), session (set ...), transaction (set local ...), query (/*+ Set ( random_page_cost 1e42 ) */) but the problem is that there's still one non-batched nested loop

FranckPachot commented 9 months ago

Another query that cannot benefit from Batched Nested Loop (run by Npgsql driver when connecting):

yugabyte=# explain (costs off)
    SELECT * FROM (                                                                                                                                   SELECT typ.oid, typnamespace, typname, typrelid, typnotnull, relkind, typelem AS elemoid,
            CASE                                                                                                                                              WHEN proc.proname='array_recv' THEN typ.typelem
                WHEN typ.typtype='r' THEN rngsubtype
                WHEN typ.typtype='d' THEN typ.typbasetype
            END AS elemtypoid
        FROM pg_type AS typ
        LEFT JOIN pg_class AS cls ON (cls.oid = typ.typrelid)
        LEFT JOIN pg_proc AS proc ON proc.oid = typ.typreceive
        LEFT JOIN pg_range ON (pg_range.rngtypid = typ.oid)
    ) AS typ
    LEFT JOIN pg_type AS elemtyp ON elemtyp.oid = elemtypoid
;

                                                  QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------------------
 YB Batched Nested Loop Left Join
   Join Filter: (elemtyp.oid = CASE WHEN (proc.proname = 'array_recv'::name) THEN typ.typelem WHEN (typ.typtype = 'r'::"char") THEN pg_range.r
ngsubtype WHEN (typ.typtype = 'd'::"char") THEN typ.typbasetype ELSE NULL::oid END)
   ->  Nested Loop Left Join
         ->  Nested Loop Left Join
               ->  YB Batched Nested Loop Left Join
                     Join Filter: (cls.oid = typ.typrelid)
                     ->  Seq Scan on pg_type typ
                     ->  Index Scan using pg_class_oid_index on pg_class cls
                           Index Cond: (oid = ANY (ARRAY[typ.typrelid, $1, $2, ..., $1023]))
               ->  Index Scan using pg_proc_oid_index on pg_proc proc
                     Index Cond: (oid = (typ.typreceive)::oid)
         ->  Index Scan using pg_range_rngtypid_index on pg_range
               Index Cond: (rngtypid = typ.oid)
   ->  Index Scan using pg_type_oid_index on pg_type elemtyp
         Index Cond: (oid = ANY (ARRAY[CASE WHEN (proc.proname = 'array_recv'::name) THEN typ.typelem WHEN (typ.typtype = 'r'::"char") THEN pg
_range.rngsubtype WHEN (typ.typtype = 'd'::"char") THEN typ.typbasetype ELSE NULL::oid END, CASE WHEN ($1027 = 'array_recv'::name) THEN $2051
WHEN ($3075 = 'r'::"char") THEN $4099 WHEN ($3075 = 'd'::"char") THEN $5123 ELSE NULL::oid END, CASE WHEN ($1028 = 'array_recv'::name) THEN $2
052 WHEN ($3076 = 'r'::"char") THEN $4100 WHEN ($3076 = 'd'::"char") THEN $5124 ELSE NULL::oid END, ..., CASE WHEN ($2049 = 'array_recv'::name
) THEN $3073 WHEN ($4097 = 'r'::"char") THEN $5121 WHEN ($4097 = 'd'::"char") THEN $6145 ELSE NULL::oid END]))
(15 rows)
ammirator-administrator commented 6 months ago

2024 still the same issue

I do not understand why they are confusing everyone with videos that Hasura is compatible with Yugabyte NO, it is not, I'm running 3 masters 3 tsservers with a lot of resources dedicated for everyone And a simple query to get the schema takes forever and fails due to timeout While using a single postgres db just does that instantly with no effort

ammirator-administrator commented 6 months ago

cc @ddorian @FranckPachot

FranckPachot commented 2 months ago

Another slow query (8s with cost based optimizer in 2.21.1.0,) run by Directus:

SELECT c.table_name
          , c.column_name
          , c.column_default as default_value
          , c.data_type
                    , c.character_maximum_length as max_length
          , c.is_generated = 'ALWAYS' is_generated
          , CASE WHEN c.is_identity = 'YES' THEN true ELSE false END is_identity
          , CASE WHEN c.is_nullable = 'YES' THEN true ELSE false END is_nullable

        FROM
          information_schema.columns c
        LEFT JOIN information_schema.tables t
          ON c.table_name = t.table_name
        WHERE
          t.table_type = 'BASE TABLE'
          AND c.table_schema IN ($1);

as reported by https://yugabyte-db.slack.com/archives/CG0KQF0GG/p1723969425773849?thread_ts=1723905877.013559&cid=CG0KQF0GG, queries makes it incompatible with Directus.

Many Nested Loops are not batched. Easy to reproduce with:

select format('
create table %I (id bigserial primary key, a int unique, b int unique, c int unique)
', 'tmp'||generate_series(1,100)
)\gexec

prepare q as
SELECT c.table_name
          , c.column_name
          , c.column_default as default_value
          , c.data_type
          , c.character_maximum_length as max_length
          , c.is_generated = 'ALWAYS' is_generated
          , CASE WHEN c.is_identity = 'YES' THEN true ELSE false END is_identity
          , CASE WHEN c.is_nullable = 'YES' THEN true ELSE false END is_nullable
        FROM
          information_schema.columns c
        LEFT JOIN information_schema.tables t
          ON c.table_name = t.table_name
        WHERE
          t.table_type = 'BASE TABLE'
          AND c.table_schema IN ($1);

analyze;
set yb_enable_optimizer_statistics to on;
set yb_enable_base_scans_cost_model to on;
explain (analyze, costs off) execute q('public');

Plan:

         ->  Nested Loop Left Join (actual time=46.757..10122.527 rows=400 loops=1)
               Join Filter: (((c.relname)::information_schema.sql_identifier)::text = ((c_1.relname)::information_schema.sql_identifier)::text)
               Rows Removed by Join Filter: 119600
               Filter: ((((CASE WHEN (nc_1.oid = pg_my_temp_schema()) THEN 'LOCAL TEMPORARY'::text WHEN (c_1.relkind = ANY ('{r,p}'::"char"[])) THEN 'BASE TABLE'::text WHEN (c_1.relkind = 'v'::"char") THEN 'VIEW'::text WHEN (c_1.relkind = 'f'::"char") THEN 'FOREIGN'::text ELSE NULL::text END)::information_schema.character_data))::text = 'BASE TABLE'::text)
               ->  Nested Loop (actual time=24.623..5492.409 rows=400 loops=1)
                     ->  Nested Loop Left Join (actual time=22.995..4392.108 rows=400 loops=1)
                           ->  Nested Loop Left Join (actual time=21.426..3227.006 rows=400 loops=1)
                                 Join Filter: (t.typtype = 'd'::"char")
                                 ->  Nested Loop (actual time=19.049..1833.778 rows=400 loops=1)
                                       ->  Nested Loop (actual time=16.798..461.001 rows=400 loops=1)
                                             ->  YB Batched Nested Loop Join (actual time=13.734..14.038 rows=100 loops=1)
                                                   Join Filter: (c.relnamespace = nc.oid)
...
                                       ->  Index Scan using pg_type_oid_index on pg_type t (actual time=3.394..3.394 rows=1 loops=400)
                                             Index Cond: (oid = a.atttypid)
                                 ->  Nested Loop (actual time=3.477..3.477 rows=0 loops=400)
                                       ->  Index Scan using pg_type_oid_index on pg_type bt (actual time=3.450..3.450 rows=0 loops=400)
                                             Index Cond: (t.typbasetype = oid)
                                       ->  Index Scan using pg_namespace_oid_index on pg_namespace nbt (never executed)
                                             Index Cond: (oid = bt.typnamespace)
                           ->  Index Scan using pg_attrdef_adrelid_adnum_index on pg_attrdef ad (actual time=2.880..2.880 rows=0 loops=400)
                                 Index Cond: ((a.attrelid = adrelid) AND (a.attnum = adnum))
                     ->  Index Scan using pg_namespace_oid_index on pg_namespace nt (actual time=2.725..2.725 rows=1 loops=400)
                           Index Cond: (oid = t.typnamespace)
               ->  Hash Join (actual time=11.164..11.507 rows=300 loops=400)