powa-team / pg_qualstats

A PostgreSQL extension for collecting statistics about predicates, helping find what indices are missing
Other
272 stars 26 forks source link

pg_qualstats_index_advisor index within JSON #54

Open romanstingler opened 1 year ago

romanstingler commented 1 year ago

I butchered together some SQL statements :)

CREATE TABLE json_test(body jsonb);

with data as (
select
    i,
    substr('8000012000345987243985743285723945798274359824375287435098432759284375843759824375423095208435019843509283475082437592843859432798572430985720984375098432759384750432986772437540958724385743985721098437520984375',
    i,
    1) as tel
from
    generate_series(1, length('8000012000345987243985743285723945798274359824375287435098432759284375843759824375423095208435019843509283475082437592843859432798572430985720984375098432759384750432986772437540958724385743985721098437520984375')) as g(i)
)
insert
    into
    json_test(body)
select
    ('{"specCode": {"name": "Telephone Number", "text": "TEL_NUM"}, "specValue": {"code": null, "text":"' || tel || '"}}')::json
from
    data
returning *;
--explain analyze
select
    body
from
    json_test
where
    body->'specValue'->>'text' = '1';

create INDEX ON "json_test" using btree ((body -> 'specValue' ->>'text'));

This should create a table with a JSONB column. Generate some random data and query a field within the JSON data (explain commented out). At the end, there is the code to create a BTREE index on the field in the where clause (condition in where is the same as the index). With the index, you can use the explain statement from above and see that the index should be used.

Such queries are not covered by the index advisor.

rjuju commented 1 year ago

Hi,

There are 2 limitations here explaining the behavior. First, pg_qualstats itself for now only tracks "simple" quals, e.g. things of the form column operator column or column operator constant. So with your example there's unfortunately nothing stored by pg_qualstats that the advisor could try to handle.

Then, even if more complex quals were handled by pg_qualstats, the advisor for now only tries to suggest indexes on one or multiple columns, but nothing more complex. So functional indexes wouldn't be handled anyway, same for partial indexes.

Clearly things could be improved, and I will try to look at it. But right now I'm not really sure of how exactly we could track complex expression. For instance if there's a function call using multiple columns, trying to attach the expr to all columns wouldn't be easy.