[YSQL] [Bitmap scans] For few microbenchmarks, PG has picked bitmap scan but YB did not.

yugabyte / yugabyte-db

YugabyteDB - the cloud native distributed SQL database for mission-critical applications.

Other

8.95k stars 1.07k forks source link

Description

List of microbenchmarks for which PG has picked bitmap scan but YB did not.

Microbenchmarks	PG Latency(ms)	YB Latency(ms)	latency Ratio
INDG12_3_index_scan_range_skey_desc_point_select	15.83	26.66	1.68
INDG12_1_index_scan_hash_skey_point_select	14.81	26.22	1.77
INDG12_2_index_scan_range_skey_point_select	15.12	26.45	1.75
JOING10_3_join_on_2tbls_withfilter_onpkey_grpby_2col	0.3	1.43	4.77
JOING9_2_join_on_3tbls_withfilter_onskey	0.39	2.03	5.21

Query and Plan details can be found here

YB build: 2.23.1.0-b174 (12Sept build)

List of gflags used in above test:

yb_enable_read_committed_isolation: true
ysql_enable_read_request_caching: true
ysql_pg_conf_csv: yb_enable_base_scans_cost_model=true,yb_bnl_batch_size=1024,yb_enable_optimizer_statistics=true,yb_use_hash_splitting_by_default=false,enable_bitmapscan=true,yb_enable_bitmapscan=true,yb_parallel_range_rows=10000,yb_enable_optimizer_statistics=true
allowed_preview_flags_csv: use_fast_backward_scan
use_fast_backward_scan: true
ysql_yb_enable_expression_pushdown: true

Issue Type

kind/bug

Warning: Please confirm that this issue does not contain any sensitive information

[X] I confirm this issue does not contain any sensitive information.

Each of the bitmap scan examples in PG are a bitmap scan with a single bitmap index scan.

This pattern occurs because of a difference in the storage layer between YB and PG, mentioned in the bitmap scan blog post:

If fewer tuples are required, fewer pages will be fetched, and so access costs are similar to random access costs. As the number of pages fetched increases, access costs approach sequential costs.

This explains an interesting pattern on Postgres, where a query with only one condition might use a Bitmap Index Scan because the almost sequential access patterns of bitmap scans are beneficial. If the query selects fewer rows, it’s more likely to choose an Index Scan. If it selects more rows, it’s more likely to choose a Sequential Scan. Somewhere in between, it will use a bitmap scan. This is due to Postgres’s disk access cost modeling.

YB still has random access from the main table, regardless of how many rows are selected, so this benefit never kicks in. On YB, a Bitmap Scan with a single BItmap Index Scan is always strictly worse than an Index Scan (because of the overhead of constructing bitmaps).

yugabyte / yugabyte-db