Add Extended statistics advisor (Poc)

powa-team / pg_qualstats

A PostgreSQL extension for collecting statistics about predicates, helping find what indices are missing

Other

272 stars 26 forks source link

CREATE TABLE t (a INT, b INT); INSERT INTO t SELECT i % 100, i % 100 FROM generate_series(1, 10000) s(i); ANALYZE t; EXPLAIN (ANALYZE, TIMING OFF) SELECT * FROM t WHERE a = 1 AND b = 1; QUERY PLAN ----------------------------------------------------------------------------- Seq Scan on t (cost=0.00..195.00 rows=1 width=8) (actual rows=100 loops=1) Filter: ((a = 1) AND (b = 1)) Rows Removed by Filter: 9900 Planning Time: 0.122 ms Execution Time: 5.226 ms (5 rows) SELECT v FROM json_array_elements( pg_qualstats_index_advisor(min_filter => 50)->'extstats') v ORDER BY v::text COLLATE "C"; v ----------------------------------------------------- "CREATE STATISTICS t_b_a_ext ON b, a FROM public.t" (1 row) CREATE STATISTICS t_b_a_ext ON b, a FROM public.t; CREATE STATISTICS ANALYZE t; EXPLAIN (ANALYZE, TIMING OFF) SELECT * FROM t WHERE a = 1 AND b = 1; QUERY PLAN ------------------------------------------------------------------------------- Seq Scan on t (cost=0.00..195.00 rows=100 width=8) (actual rows=100 loops=1) Filter: ((a = 1) AND (b = 1)) Rows Removed by Filter: 9900 Planning Time: 0.182 ms Execution Time: 4.430 ms (5 rows)

Thanks a lot for working on that @yamatattsu !

First, it should be done as of a new version, probably a 2.1.0, as it's adding some new feature rather than bugfix. I'll go ahead and commit preliminary work for that, so you don't have to deal with it in this patchset.

Then, regarding the feature in itself. I think it'd be better to create a new function, as they'll probably end up with a lot of differences. The most obvious one is probably the thresholds and filters. If you're using the same function, you'll discard quals that are used as part of index scan, which is probably a bad idea. Also, we should have some threshold based on selectivity estimation error here.

If needed we could of course pull out some part of the existing index advisors in functions / views so we don't have to copy/paste too many things.

The way I imagined that feature would be to work based on all compound predicates that have at least 2 simple predicates, with enough selectivity estimation error (ratio and/or raw num). Then, maybe optionally, try to validate that those errors are. really due to correlated columns. This could be done by checking that those simple predicates when used alone have good selectivity estimates. Of course there'll always be cases where compound predicates use simple predicates that are never used alone, so we can't make a hard requirement on those being available for a validation.

powa-team / pg_qualstats

Add Extended statistics advisor (Poc) #33