yugabyte / yugabyte-db

YugabyteDB - the cloud native distributed SQL database for mission-critical applications.
https://www.yugabyte.com
Other
8.97k stars 1.07k forks source link

[DocDB] Utilize Bloom filters for multi-row operations #20405

Open mbautin opened 10 months ago

mbautin commented 10 months ago

Jira Link: DB-9397

Description

In the context of discussing a perf regression related to Bloom filters not being checked, @ttyusupov mentioned:

The result is that there were two separate breaking changes. First of them made index table bloom filters to be not checked: https://github.com/yugabyte/yugabyte-db/commit/beeebbe873bef97561427bc13f4906d607340ba9 And the second one made also main table bloom filters to be not checked: https://github.com/yugabyte/yugabyte-db/commit/7b582dd6edf1d764292ca69d8bcb323f5e8c5db4

Related issue: #20398

Test case from @ttyusupov and @kmuthukk:

\timing on
drop table if exists test_table;
create extension if not exists pgcrypto;
create table test_table(k text, v text, PRIMARY KEY(k ASC));
create index index_v_test_table on test_table(v);

-- Load 10M rows
do $$
begin
  for counter in 1..100 loop
    raise notice 'counter: %', (counter * 10000);
    insert into test_table (select gen_random_uuid(), gen_random_bytes(50)::text
                        from generate_series(1, 10000) i);
    commit;
  end loop;
end $$;

and then

curl -s http://localhost:9000/prometheus-metrics | grep -E "test_table" | grep -wE "rocksdb_bloom_filter_useful|rocksdb_bloom_filter_checked|rocksdb_block_cache_data_hit|rocksdb_block_cache_data_miss"  | sed 's/expo.*9000..//' | sed 's/[0-9]*$//' | tr " " "\n"

Issue Type

kind/bug

Warning: Please confirm that this issue does not contain any sensitive information

mbautin commented 9 months ago

Here is a graph of performance of the above test as a function of the number of Bloom filter keys per group: graph

rthallamko3 commented 6 days ago

@mbautin , Can we close this. Were you planning to do additional things for this?