[YSQL] Index costing does not take the index's fulfilled columns into account.

yugabyte / yugabyte-db

YugabyteDB - the cloud native distributed SQL database for mission-critical applications.

Other

8.99k stars 1.07k forks source link

Jira Link: DB-6753

Description

Consider the following setup

CREATE TABLE sample(b int, a int, c int);
CREATE INDEX bad_index on sample (a asc, b asc, c asc);
CREATE INDEX good_index on sample (b asc);
INSERT INTO sample select i/100, (i/10) % 10, i%10 from generate_series(1,99999) i;
ANALYZE sample;
/*+IndexScan(sample good_index)*/explain analyze SELECT b FROM sample where b = 5;
/*+IndexScan(sample bad_index)*/explain analyze SELECT b FROM sample where b = 5;

We get the following output

                                                        QUERY PLAN                                                          
-----------------------------------------------------------------------------------------------------------------------------
 Index Only Scan using good_index on sample  (cost=4.00..19.23 rows=100 width=4) (actual time=1.736..1.777 rows=100 loops=1)
   Index Cond: (b = 5)
   Heap Fetches: 0
 Planning Time: 0.089 ms
 Execution Time: 1.827 ms
 Peak Memory Usage: 0 kB
(6 rows)

and

                                                      QUERY PLAN                                                       
-----------------------------------------------------------------------------------------------------------------------
 Index Scan using bad_index on sample  (cost=4.00..20.23 rows=100 width=4) (actual time=2.804..2.855 rows=100 loops=1)
   Index Cond: (b = 5)
 Planning Time: 0.136 ms
 Execution Time: 2.920 ms
 Peak Memory Usage: 8 kB
(5 rows

Note that using the good_index which has a prefix of its key columns constrained operates ~3.7x faster than bad_index but the cost estimates are pretty much the same. In PG, the btree costing function is aware taht constraints on the prefix of an index's columns limits the scan space while our YB index costing logic does not appear to take this reality into account.

Warning: Please confirm that this issue does not contain any sensitive information

[X] I confirm this issue does not contain any sensitive information.

yugabyte=# explain analyze SELECT b FROM sample where b = 5; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------- Index Only Scan using good_index on sample (cost=7.20..81.45 rows=100 width=4) (actual time=1.975..2.039 rows=100 loops=1) Index Cond: (b = 5) Heap Fetches: 0 Planning Time: 0.114 ms Execution Time: 2.170 ms Peak Memory Usage: 24 kB (6 rows) yugabyte=# /*+IndexScan(sample bad_index)*/explain analyze SELECT b FROM sample where b = 5; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------- Index Scan using bad_index on sample (cost=14.40..976.41 rows=100 width=4) (actual time=8.668..8.788 rows=100 loops=1) Index Cond: (b = 5) Planning Time: 0.186 ms Execution Time: 8.994 ms Peak Memory Usage: 24 kB (5 rows)

yugabyte / yugabyte-db

[YSQL] Index costing does not take the index's fulfilled columns into account. #17630

Description

Warning: Please confirm that this issue does not contain any sensitive information