yugabyte / yugabyte-db

YugabyteDB - the cloud native distributed SQL database for mission-critical applications.
https://www.yugabyte.com
Other
8.66k stars 1.04k forks source link

[YSQL][Optimizer] Optimizer may select non optimal index scan if WHERE clause used in query #13334

Open qvad opened 1 year ago

qvad commented 1 year ago

Jira Link: DB-2968

Description

Not reproduced w/o WHERE clause

Reproducer:

CREATE TABLE t1000000 as select a, md5(random()::text) from generate_Series(1,1000000) a;
CREATE INDEX t1000000_idx ON t1000000(a);
ANALYZE t1000000;

SELECT *
FROM   t1000000
WHERE  t1000000.a < 100

Default plan:

Index Scan using t1000000_idx on t1000000  (cost=0.00..16.25 rows=100 width=36) (actual time=44.661..7759.116 rows=99 loops=1)
  Index Cond: (a < 100)
Planning Time: 3.500 ms
Execution Time: 7778.766 ms
Peak Memory Usage: 8184 kB

Plan with seq scan:

Seq Scan on t1000000  (cost=0.00..12500.00 rows=100 width=36) (actual time=47.439..2984.911 rows=99 loops=1)
  Filter: (a < 100)
  Rows Removed by Filter: 999901
Planning Time: 0.052 ms
Execution Time: 2991.490 ms
Peak Memory Usage: 270328 kB
ramsrivatsa commented 1 year ago
yugabyte=# CREATE INDEX t1000000_idx2 ON t1000000(a ASC);
CREATE INDEX
yugabyte=# EXPLAIN ANALYZE SELECT * FROM   t1000000 WHERE  t1000000.a < 100;
                                                           QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------
 Index Scan using t1000000_idx2 on t1000000  (cost=0.00..1232.71 rows=10030 width=36) (actual time=8.710..8.756 rows=99 loops=1)
   Index Cond: (a < 100)
 Planning Time: 3.418 ms
 Execution Time: 8.802 ms
 Peak Memory Usage: 0 kB
(5 rows)

yugabyte=# \d t1000000
              Table "public.t1000000"
 Column |  Type   | Collation | Nullable | Default
--------+---------+-----------+----------+---------
 a      | integer |           |          |
 md5    | text    |           |          |
Indexes:
    "t1000000_idx" lsm (a HASH)
    "t1000000_idx2" lsm (a ASC)

you need a range key index if you want to do inequalities.. hash key indexes revert to seq scan.