yugabyte / yugabyte-db

YugabyteDB - the cloud native distributed SQL database for mission-critical applications.
https://www.yugabyte.com
Other
8.95k stars 1.07k forks source link

[YSQL] [Bitmap scans] For few microbenchmarks, PG has picked bitmap scan but YB did not. #24392

Closed shantanugupta-yb closed 1 week ago

shantanugupta-yb commented 1 week ago

Jira Link: DB-13302

Description

List of microbenchmarks for which PG has picked bitmap scan but YB did not.

Microbenchmarks PG Latency(ms) YB Latency(ms) latency Ratio
INDG12_3_index_scan_range_skey_desc_point_select 15.83 26.66 1.68
INDG12_1_index_scan_hash_skey_point_select 14.81 26.22 1.77
INDG12_2_index_scan_range_skey_point_select 15.12 26.45 1.75
JOING10_3_join_on_2tbls_withfilter_onpkey_grpby_2col 0.3 1.43 4.77
JOING9_2_join_on_3tbls_withfilter_onskey 0.39 2.03 5.21

Query and Plan details can be found here

YB build: 2.23.1.0-b174 (12Sept build)

List of gflags used in above test:

Issue Type

kind/bug

Warning: Please confirm that this issue does not contain any sensitive information

timothy-e commented 1 week ago

Each of the bitmap scan examples in PG are a bitmap scan with a single bitmap index scan.

This pattern occurs because of a difference in the storage layer between YB and PG, mentioned in the bitmap scan blog post:

If fewer tuples are required, fewer pages will be fetched, and so access costs are similar to random access costs. As the number of pages fetched increases, access costs approach sequential costs.

This explains an interesting pattern on Postgres, where a query with only one condition might use a Bitmap Index Scan because the almost sequential access patterns of bitmap scans are beneficial. If the query selects fewer rows, it’s more likely to choose an Index Scan. If it selects more rows, it’s more likely to choose a Sequential Scan. Somewhere in between, it will use a bitmap scan. This is due to Postgres’s disk access cost modeling.

YB still has random access from the main table, regardless of how many rows are selected, so this benefit never kicks in. On YB, a Bitmap Scan with a single BItmap Index Scan is always strictly worse than an Index Scan (because of the overhead of constructing bitmaps).