powa-team / pg_qualstats

A PostgreSQL extension for collecting statistics about predicates, helping find what indices are missing
Other
272 stars 26 forks source link

pg_qualstats_index_advisor functionality #53

Closed romanstingler closed 1 year ago

romanstingler commented 1 year ago

I installed the latest revision 66b3037 on my local PG 14.5 machine. I found that you added the "docs" for the advisor here https://github.com/powa-team/pg_qualstats/issues/31

Your code example:

SELECT v
  FROM json_array_elements(
    pg_qualstats_index_advisor(min_filter => 50)->'indexes') v
  ORDER BY v::text COLLATE "C";

fails with SQL Error [22023]: ERROR: cannot call json_array_elements on a scalar

but all I can figure out is:

SELECT pg_qualstats_index_advisor();

{
  "indexes": null,
  "unoptimised": [
    {
      "qual": "assemblies.\"orderReferenceNumber\" ~~* ?",
      "queryids": [
        -6084300706674462045,
        -5817621241163067326
      ]
    },
    {
      "qual": "orders.\"orderReferenceNumber\" ~~* ?",
      "queryids": [
        5707918505950049101,
        9082355525220716366
      ]
    }
  ]
}

taking the first query : select pg_qualstats_example_query(-6084300706674462045);

select
    *
from
    "assemblies"
where
    ("assemblies"."orderInternalId"::text ilike $1
        or "assemblies"."orderReferenceNumber"::text ilike $2)
order by
    "assemblies"."id" desc,
    "orderType" asc
limit 10 offset 0

I have a BTREE index for both fields (orderInternalId,orderReferenceNumber).

If I remove one of the indexes and rerun the query again and check again

SELECT pg_qualstats_index_advisor(); I get the same result as before, just with 1 more queryId.

I have a very busy schedule and just skimmed the code it seems that v_ddl is the part from your code example output and this is not returned. As soon as I have some time, I will have a closer look at the function.

rjuju commented 1 year ago

Hi,

The error is because the "indexes" field is "null" and not an empty array or an array containing null. I'm not sure how it can happen as the underlying variable is initialized with an empty array and skip null or empty "v_ddl". Even if the generated index DDL was null, it should still yield an array, like:

rjuju=# select json_build_object('a', array_append('{}'::text[], null));
 json_build_object
-------------------
 {"a" : [null]}
(1 row)

For the unoptimized quals, that's expected. The ILIKE (~~*) operator can't be automatically handled, as the index you shoud create depends on what values are used, and where are the placeholders (%) if any.

romanstingler commented 1 year ago

thx,

i tried to change the line to

v_indexes json[]= '{}';

Works, but I have to check why my index is empty.

When I reset the qualstats: select pg_qualstats_reset(); and then select pg_qualstats_index_advisor();

{
  "indexes": [
    []
  ],
  "unoptimised": null
}

Then I remove the index on the created_at

explain
SELECT *
FROM "work_orders"
WHERE "work_orders"."created_at" >= '2002-08-23 00:00:00'
  and "work_orders"."created_at" <= '2022-08-24 23:59:59';

the query plan gives me

Seq Scan on work_orders  (cost=0.00..1.04 rows=3 width=160)
  Filter: ((created_at >= '2002-08-23 00:00:00'::timestamp without time zone) AND (created_at <= '2022-08-24 23:59:59'::timestamp without time zone))

When I execute the query

I get 2 entries in the output of select pg_qualstats();SELECT * FROM push_job_logs WHERE "push_job_logs"."updated_at" >= '2022-07-19 07:50:30.000' and "push_job_logs"."updated_at" <= '2022-07-19 07:59:30.000'; (10,16384,3050129,17,2065,,,1012069729,2475159725,763865873,2803659205,1,3,0,0,0,0,0,0,0,0,0,64,-715090605387815437,"'2002-08-23 00:00:00'::timestamp without time zone",f)

but the advisor gives me (empty arrays with my code change) :

{
  "indexes": [],
  "unoptimised": []
}

I tried another query with more than 1000 entries

SELECT *
FROM push_job_logs 
WHERE "push_job_logs"."updated_at" >= '2022-07-19 07:50:30.000'
  and "push_job_logs"."updated_at" <= '2022-07-19 07:59:30.000';
Seq Scan on push_job_logs  (cost=0.00..316.25 rows=1 width=1368)
  Filter: ((updated_at >= '2022-07-19 07:50:30'::timestamp without time zone) AND (updated_at <= '2022-07-19 07:59:30'::timestamp without time zone))

Where an index would be used

but still

  {
  "indexes": [],
  "unoptimised": []
}

          SELECT dbid, amname, qualid, qualnodeid,
            (coalesce(lrelid, rrelid), coalesce(lattnum, rattnum),
            opno, eval_type)::public.qual AS qual, queryid,
            round(avg(execution_count)) AS execution_count,
            sum(occurences) AS occurences,
            round(sum(nbfiltered)::numeric / sum(occurences)) AS avg_filter,
            CASE WHEN sum(execution_count) = 0
              THEN 0
              ELSE round(sum(nbfiltered::numeric) / sum(execution_count) * 100)
            END AS avg_selectivity
          FROM public.pg_qualstats() q
          JOIN pg_catalog.pg_database d ON q.dbid = d.oid
          JOIN pg_catalog.pg_operator op ON op.oid = q.opno
          JOIN pg_catalog.pg_amop amop ON amop.amopopr = op.oid
          JOIN pg_catalog.pg_am am ON am.oid = amop.amopmethod
          WHERE d.datname = current_database()
          AND eval_type = 'f'
          AND amname != ALL ('{}')
          AND coalesce(lrelid, rrelid) != 0
          AND qualnodeid != ALL('{}')
          GROUP BY dbid, amname, qualid, qualnodeid, lrelid, rrelid,
            lattnum, rattnum, opno, eval_type, queryid;
dbid |amname|qualid    |qualnodeid|qual               |queryid            |execution_count|occurences|avg_filter|avg_selectivity|
-----+------+----------+----------+-------------------+-------------------+---------------+----------+----------+---------------+
16384|btree |3220348345|3647030479|(3050070,10,2065,f)|4635862364285714673|           2626|        10|       485|             92|
16384|brin  |3220348345|2377652190|(3050070,10,2063,f)|4635862364285714673|           2626|        20|       485|             92|
16384|btree |3220348345|2377652190|(3050070,10,2063,f)|4635862364285714673|           2626|        10|       485|             92|
16384|brin  |3220348345|3647030479|(3050070,10,2065,f)|4635862364285714673|           2626|        20|       485|             92|
rjuju commented 1 year ago

i tried to change the line to v_indexes json[]= '{}';

It shouldn't be necessary. I tried locally on the same version (2.0.4), and it's working as expected:

CREATE TABLE t1(id integer, val text);
INSERT INTO t1 SELECT i, 'line ' || i FROM generate_series(1, 1000000) i;
SELECT * FROM t1 WHERE val ILIKE 'meh';
SELECT * FROM pg_qualstats_index_advisor(min_filter => 50);
             pg_qualstats_index_advisor
----------------------------------------------------
 {"indexes" : [], "unoptimised" : ["t1.val ~~* ?"]}
(1 row)

Simulating your use case:

CREATE TABLE push_job_logs(id integer, updated_at timestamp);
INSERT INTO push_job_logs select i, '2022-01-01'::timestamp + interval '1 day' * i from generate_series(1, 100000) i;
SELECT * FROM push_job_logs where updated_at >= '2022-02-01' AND updated_at <= '2022-02-02';
 id |     updated_at
----+---------------------
 31 | 2022-02-01 00:00:00
 32 | 2022-02-02 00:00:00
(2 rows)

SELECT * FROM pg_qualstats_index_advisor(min_filter => 50);
                                                                            pg_qualstats_index_advisor
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 {"indexes" : ["CREATE INDEX ON public.push_job_logs USING btree (updated_at)","CREATE INDEX ON public.push_job_logs USING btree (updated_at)"], "unoptimised" : ["t1.val ~~* ?"]}
(1 row)

(clearly the redundant indexes should be removed, but still it's giving sensible results)

There's definitely something unexpected on your environment, but I don't know what.

Can you try to call pg_qualstats_reset(), set pg_qualstats.sample_rate to 1 and try again the push_job_logs test case? Maybe some of the stored quals are responsible for your problems.

romanstingler commented 1 year ago

Could it be that the issue is the configuration

I changed from the defaults mentioned in the docs

pg_qualstats.enabled = true
pg_qualstats.track_constants = true
pg_qualstats.max = 1000
pg_qualstats.resolve_oids = false
pg_qualstats.track_pg_catalog = false
pg_qualstats.sample_rate = 1

to

pg_qualstats.resolve_oids = true
pg_qualstats.track_pg_catalog = true

maybe pg_qualstats.track_pg_catalog is mandatory. Now when I delete the index I get

{
  "ddl": "CREATE INDEX ON public.push_job_logs USING btree (updated_at)",
  "queryids": [
    4635862364285714673
  ]
}

For me, it still fails if I reset the stats and execute your code example I will keep this as this works perfectly fine

    v_processed bigint[] = '{}';
    v_indexes json[] = '{}';
    v_unoptimised json[] = '{}';

Thx for the fast help


not the best solution but for now I will use this

select
    distinct to_jsonb(v)->'ddl' as v
from
    json_array_elements(
    pg_qualstats_index_advisor()->'indexes') v
where
    v->>'ddl' not like '%pg_catalog%';
rjuju commented 1 year ago

pg_qualstats.resolve_oids = true pg_qualstats.track_pg_catalog = true

those shouldn't have an impact.

Now when I delete the index I get

{ "ddl": "CREATE INDEX ON public.push_job_logs USING btree (updated_at)", "queryids": [ 4635862364285714673 ] }

This output comes from https://github.com/powa-team/pg_qualstats/commit/d203b05c4e53012d599191e702d7c8d6fe3c8222, so this is not the version 2.0.4 but the currently dev version 2.1.0.

It's unclear to me why you need to change the type of the arrays. Did you create any extra extension, or custom cast or something? Can you produce a script that creates a new database, create the pg_qualstats extension, some tables, add some data and reproduces the problem that I could try locally?

romanstingler commented 1 year ago

The issue is if you don't pass a default value to the declaration it is initialized with NULL.

https://dbfiddle.uk/?rdbms=postgres_14&fiddle=b3f6ef8f178b471b5c9a78e0dd47c7c0

https://www.postgresql.org/docs/current/plpgsql-declarations.html The DEFAULT clause, if given, specifies the initial value assigned to the variable when the block is entered. If the DEFAULT clause is not given then the variable is initialized to the SQL null value.

at Line 177 you add values to the default null value of the declaration, so they behave as they would have been declared with default values at the beginning.

        v_ddl := '';
        v_quals_todo := '{}';
        v_quals_done := '{}';
        v_quals_col_done := '{}';
rjuju commented 1 year ago

Yes I agree that the default is NUll, but I do initialize the used v_indexes and v_unoptimized at https://github.com/powa-team/pg_qualstats/blob/master/pg_qualstats--2.0.4.sql#L447-L448:

DECLARE
    [...]
    v_indexes text[] = '{}';
    v_unoptimised text[] = '{}';

So later, you get:

        -- if underlying table has been dropped, skip this (broken) index
        CONTINUE WHEN coalesce(v_ddl, '') = '';
[...]

        -- and append it to the list of generated indexes
        v_indexes := array_append(v_indexes, v_ddl);

So I still don't see how exactly v_indexes can be NULL.

For the variables you're showing here, they're reused in a loop, so I initialize them at the beginning of each loop.

Am I missing something, or which variable is missing an explicit initialization?

rjuju commented 1 year ago

As far as I can see this is the case since https://github.com/powa-team/pg_qualstats/commit/0732013223040898f326bdf754b25771c015a510, so 2 years and a half ago. Are you sure that you're using the released 2.0.4 (or 2.1.0dev) and not an old 2.0.0dev version?

romanstingler commented 1 year ago

sorry for distracting with the 2.0.4 version number

but I use the code from the latest commit.

in https://github.com/powa-team/pg_qualstats/blob/master/pg_qualstats--2.1.0.sql

in line 451

    v_indexes json[];
    v_unoptimised json[];

PS: as well as in L12 https://github.com/powa-team/pg_qualstats/blob/master/pg_qualstats--2.0.4--2.1.0.sql

rjuju commented 1 year ago

Ahhh, that explains everything! Thanks a lot for the report. I apparently broke that in https://github.com/powa-team/pg_qualstats/commit/72e84f1f451b7cb2818c64177970babc6918d5be. I will fix that shortly, and add some regression tests to make sure it doesn't get broken again.

romanstingler commented 1 year ago

THX for your patience :)

rjuju commented 1 year ago

This should be fixed as of https://github.com/powa-team/pg_qualstats/commit/c0b6ca06db307ed9ddd53b3facb0bc0c22b655b3.

Thanks again!

romanstingler commented 1 year ago

Works perfectly now.

rjuju commented 1 year ago

Great news! I'm closing the issue, feel free to reopen it or create a new one if needed.