Paired Block Bloom Filter Algorithm

speedb-io / speedb

A RocksDB compliant high performance scalable embedded key-value store

https://www.speedb.io/

Apache License 2.0

912 stars 71 forks source link

Paired Block Bloom Filter Algorithm #29

Closed udi-speedb closed 2 years ago

udi-speedb commented 2 years ago

Why :

Reduce false positives rate while using the same amount of memory.

What:

Develop a filter which is fast and low on CPU consumption on the one hand, but with a better memory footprint- FPR trade-off on the other hand.

Technical detail:

In the traditional bloom filter there is a tradeoff between memory usage and performance. Rocksdb blocked bloom filter takes less time but consumes extra memory.

Ribbon filter, on the other hand, takes ~30% less memory but is much slower than the bloom filter (factor of 4).

The idea is to improve bloom filter in both memory consumption and keep it high performant.

Who:

The proposed filter should be most beneficial when there is a need for a very small FPR. Typically this happens when the penalty of a false positive is very big compared to the filter test time (database on the disk), and when true positives are rare.

Integrate a new type of filter policy: Paired Block Bloom Filter

erez-speedb commented 2 years ago

Specific tests:

Standard tests with 24BPK on both main and the branch.
Filters fast and ribbon.
Paired compare to the best from before.
Compare 8 BPK to 24 on main.

Action for now, create the baseline on main for 1,2,4

erez-speedb commented 2 years ago

Additional tests bloom + pair Worse case scenario, DB in cache, all keys exists. Test: Fillup sequential, random reads. Best case, DB not in cache, get before write Test TBD Fillup n keys, random reads 10000X keys. -- small obj. Overwrite without filllup?

udi-speedb commented 2 years ago

The flag that sets the filter type in db_bench is filter_uri. Paired bloom (new): -filter_uri spdb.PairedBloomFilter:BPK (e.g., -filter_uri spdb.PairedBloomFilter:23.4) Fast Local Bloom: -filter_uri rocksdb.internal.FastLocalBloomFilter:BPK Ribbon: -filter_uri rocksdb.internal.Standard128RibbonFilter:BPK

udi-speedb commented 2 years ago

@erez-speedb I have pushed the branch rebased on latest main. Please go ahead with the basic performance tests we have agreed upon. Thanks

erez-speedb commented 2 years ago

./db_bench --compression_type=None -db=/data/ -num=80000000 -value_size=1000 -key_size=16 --delayed_write_rate=536870912 -report_interval_seconds=1 -max_write_buffe r_number=0 -histogram -duration=900 --use_existing_db -threads=50 -seek_nexts=100 -report_file=seekrandomwriterandom.csv -benchmark_read_rate_limit=0 -benchmark_write_rate_limit=0 --benchmarks=seekrandomwriterandom -filter_uri=spdb.PairedBloomFilter::23.4 -readwritepercent=95

failure creating filter policy[spdb.PairedBloomFilter::23.4]: Not implemented: Could not load FilterPolicy: spdb.PairedBloomFilter::23.4

udi-speedb commented 2 years ago

@erez-speedb - Sorry, my mistake in the example. There should be a single ':' not '::' -filter_uri=spdb.PairedBloomFilter:23.4

erez-speedb commented 2 years ago

Rerunning tests

udi-speedb commented 2 years ago

One thing that needs attention: The performance of the filter is heavily affected by the availability of AVX2 support in the processor.

We must make sure to compare rocksdb / us on platforms that have the same AVX2 support.
We may wish to compare rocksdb / us with AVX2 and without AVX2.
The beneficial use case is when using AVX2 so this is a requirement for the best case scenario.

isaac-io commented 2 years ago

Blocked by #101.

isaac-io commented 2 years ago

Didn't show an improvement with #101, so we need to define a good test to show the value of the feature.

erez-speedb commented 2 years ago

Running the test on a single HDD (simulating disk as bottleneck) and with DB size larger than RAM Showed clear benefit

With no additional memory usage, compare to the default bloom with the same BPK

isaac-io commented 2 years ago

Depends on #71 and on #123.

Yuval-Ariel commented 2 years ago

QA passed on 4cf14cb2ae85e4c6ad906e26c4aa2269578f3716

erez-speedb commented 2 years ago

Pass performance tests.