yugabyte / yugabyte-db

YugabyteDB - the cloud native distributed SQL database for mission-critical applications.
https://www.yugabyte.com
Other
8.99k stars 1.07k forks source link

[YSQL] Index costing does not take the index's fulfilled columns into account. #17630

Open tanujnay112 opened 1 year ago

tanujnay112 commented 1 year ago

Jira Link: DB-6753

Description

Consider the following setup

CREATE TABLE sample(b int, a int, c int);
CREATE INDEX bad_index on sample (a asc, b asc, c asc);
CREATE INDEX good_index on sample (b asc);
INSERT INTO sample select i/100, (i/10) % 10, i%10 from generate_series(1,99999) i;
ANALYZE sample;
/*+IndexScan(sample good_index)*/explain analyze SELECT b FROM sample where b = 5;
/*+IndexScan(sample bad_index)*/explain analyze SELECT b FROM sample where b = 5;

We get the following output

                                                        QUERY PLAN                                                          
-----------------------------------------------------------------------------------------------------------------------------
 Index Only Scan using good_index on sample  (cost=4.00..19.23 rows=100 width=4) (actual time=1.736..1.777 rows=100 loops=1)
   Index Cond: (b = 5)
   Heap Fetches: 0
 Planning Time: 0.089 ms
 Execution Time: 1.827 ms
 Peak Memory Usage: 0 kB
(6 rows)

and

                                                      QUERY PLAN                                                       
-----------------------------------------------------------------------------------------------------------------------
 Index Scan using bad_index on sample  (cost=4.00..20.23 rows=100 width=4) (actual time=2.804..2.855 rows=100 loops=1)
   Index Cond: (b = 5)
 Planning Time: 0.136 ms
 Execution Time: 2.920 ms
 Peak Memory Usage: 8 kB
(5 rows

Note that using the good_index which has a prefix of its key columns constrained operates ~3.7x faster than bad_index but the cost estimates are pretty much the same. In PG, the btree costing function is aware taht constraints on the prefix of an index's columns limits the scan space while our YB index costing logic does not appear to take this reality into account.

Warning: Please confirm that this issue does not contain any sensitive information

gauravk-in commented 1 year ago

With the new cost model (#18484) we get the following plans.

yugabyte=# explain analyze SELECT b FROM sample where b = 5;
                                                         QUERY PLAN                                                          
-----------------------------------------------------------------------------------------------------------------------------
 Index Only Scan using good_index on sample  (cost=7.20..81.45 rows=100 width=4) (actual time=1.975..2.039 rows=100 loops=1)
   Index Cond: (b = 5)
   Heap Fetches: 0
 Planning Time: 0.114 ms
 Execution Time: 2.170 ms
 Peak Memory Usage: 24 kB
(6 rows)

yugabyte=# /*+IndexScan(sample bad_index)*/explain analyze SELECT b FROM sample where b = 5;
                                                       QUERY PLAN                                                        
-------------------------------------------------------------------------------------------------------------------------
 Index Scan using bad_index on sample  (cost=14.40..976.41 rows=100 width=4) (actual time=8.668..8.788 rows=100 loops=1)
   Index Cond: (b = 5)
 Planning Time: 0.186 ms
 Execution Time: 8.994 ms
 Peak Memory Usage: 24 kB
(5 rows)

The cost for the Index scan on bad_index is nearly 12 times, while execution time is only 4.4 times worse. The estimate for seeks seems accurate in the second case, while nexts are underestimated by the cost model. This may be improved upon with an ongoing tuning exercise to adjust the costs parameters used inside the cost model.