CREATE TABLE t_large (a int, b int, c text);
INSERT INTO t_large SELECT i, j, repeat('a', 10000) FROM generate_series(1, 10) i, generate_series(1, 10) j;
CREATE INDEX partial_idx_large ON t_large (b ASC) WHERE a = 1;
CREATE INDEX full_idx_large ON t_large (b ASC) ;
ANALYZE t_large;
The amount of data transferred is estimated as indexSelectivity * docDBRowSize. However, in a partial index, the index selectivity can be miscalculated, resulting in the partial index condition being dropped from the row selectivity calculation.
e.g. for the conditions (a = 1 AND b < 5), we would erroneously consider only the condition on b < 5 and estimate 40% selectivity, so we expect to fetch 40 rows that are each 10kB.
/*+ IndexScan(t_large partial_idx_large)*/ EXPLAIN (ANALYZE, DIST, SUMMARY OFF, DEBUG) SELECT * FROM t_large WHERE a = 1 AND b < 5;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------
Index Scan using partial_idx_large on t_large (cost=734.04..32089.36 rows=4 width=10012) (actual time=1.017..1.045 rows=4 loops=1)
Index Cond: (b < 5)
Storage Table Read Requests: 1
Storage Table Read Execution Time: 0.390 ms
Storage Table Rows Scanned: 4
Storage Index Read Requests: 1
Storage Index Read Execution Time: 0.480 ms
Storage Index Rows Scanned: 4
Metric rocksdb_number_db_seek: 5.000
Metric rocksdb_number_db_next: 10.000
Metric rocksdb_number_db_seek_found: 5.000
Metric rocksdb_number_db_next_found: 10.000
Metric rocksdb_iter_bytes_read: 101050.000
Metric docdb_keys_found: 8.000
Metric ql_read_latency: sum: 167.000, count: 3.000
Estimated Seeks: 5
Estimated Nexts: 8
Estimated Docdb Result Width: 10023
(18 rows)
If we don't use a partial index, then both a = 1 and b < 5 are considered, so the selectivity is estimated as 4%, so we expect to fetch only 4 rows that are each 10kB.
/*+ IndexScan(t_large full_idx_large)*/ EXPLAIN (ANALYZE, DIST, SUMMARY OFF, DEBUG) SELECT * FROM t_large WHERE a = 1 AND b < 5;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------
Index Scan using full_idx_large on t_large (cost=860.38..4950.34 rows=4 width=10012) (actual time=2.744..2.793 rows=4 loops=1)
Index Cond: (b < 5)
Storage Filter: (a = 1)
Storage Table Read Requests: 1
Storage Table Read Execution Time: 0.632 ms
Storage Table Rows Scanned: 40
Storage Index Read Requests: 1
Storage Index Read Execution Time: 1.806 ms
Storage Index Rows Scanned: 40
Metric rocksdb_number_db_seek: 36.000
Metric rocksdb_number_db_next: 131.000
Metric rocksdb_number_db_seek_found: 36.000
Metric rocksdb_number_db_next_found: 131.000
Metric rocksdb_iter_bytes_read: 1272255.000
Metric docdb_keys_found: 80.000
Metric ql_read_latency: sum: 471.000, count: 3.000
Estimated Seeks: 23
Estimated Nexts: 62
Estimated Docdb Result Width: 10023
(19 rows)
Issue Type
kind/bug
Warning: Please confirm that this issue does not contain any sensitive information
[X] I confirm this issue does not contain any sensitive information.
Jira Link: DB-14047
Description
Schema:
The amount of data transferred is estimated as
indexSelectivity * docDBRowSize
. However, in a partial index, the index selectivity can be miscalculated, resulting in the partial index condition being dropped from the row selectivity calculation.e.g. for the conditions
(a = 1 AND b < 5)
, we would erroneously consider only the condition onb < 5
and estimate 40% selectivity, so we expect to fetch 40 rows that are each 10kB.If we don't use a partial index, then both a = 1 and b < 5 are considered, so the selectivity is estimated as 4%, so we expect to fetch only 4 rows that are each 10kB.
Issue Type
kind/bug
Warning: Please confirm that this issue does not contain any sensitive information