Closed udi-speedb closed 2 years ago
Specific tests:
Action for now, create the baseline on main for 1,2,4
Additional tests bloom + pair Worse case scenario, DB in cache, all keys exists. Test: Fillup sequential, random reads. Best case, DB not in cache, get before write Test TBD Fillup n keys, random reads 10000X keys. -- small obj. Overwrite without filllup?
The flag that sets the filter type in db_bench is filter_uri. Paired bloom (new): -filter_uri spdb.PairedBloomFilter:BPK (e.g., -filter_uri spdb.PairedBloomFilter:23.4) Fast Local Bloom: -filter_uri rocksdb.internal.FastLocalBloomFilter:BPK Ribbon: -filter_uri rocksdb.internal.Standard128RibbonFilter:BPK
@erez-speedb I have pushed the branch rebased on latest main. Please go ahead with the basic performance tests we have agreed upon. Thanks
./db_bench --compression_type=None -db=/data/ -num=80000000 -value_size=1000 -key_size=16 --delayed_write_rate=536870912 -report_interval_seconds=1 -max_write_buffe r_number=0 -histogram -duration=900 --use_existing_db -threads=50 -seek_nexts=100 -report_file=seekrandomwriterandom.csv -benchmark_read_rate_limit=0 -benchmark_write_rate_limit=0 --benchmarks=seekrandomwriterandom -filter_uri=spdb.PairedBloomFilter::23.4 -readwritepercent=95
failure creating filter policy[spdb.PairedBloomFilter::23.4]: Not implemented: Could not load FilterPolicy: spdb.PairedBloomFilter::23.4
@erez-speedb - Sorry, my mistake in the example. There should be a single ':' not '::' -filter_uri=spdb.PairedBloomFilter:23.4
Rerunning tests
One thing that needs attention: The performance of the filter is heavily affected by the availability of AVX2 support in the processor.
Blocked by #101.
Didn't show an improvement with #101, so we need to define a good test to show the value of the feature.
Running the test on a single HDD (simulating disk as bottleneck) and with DB size larger than RAM Showed clear benefit
With no additional memory usage, compare to the default bloom with the same BPK
Depends on #71 and on #123.
QA passed on 4cf14cb2ae85e4c6ad906e26c4aa2269578f3716
Pass performance tests.
Why :
Reduce false positives rate while using the same amount of memory.
What:
Develop a filter which is fast and low on CPU consumption on the one hand, but with a better memory footprint- FPR trade-off on the other hand.
Technical detail:
In the traditional bloom filter there is a tradeoff between memory usage and performance. Rocksdb blocked bloom filter takes less time but consumes extra memory.
Ribbon filter, on the other hand, takes ~30% less memory but is much slower than the bloom filter (factor of 4).
The idea is to improve bloom filter in both memory consumption and keep it high performant.
Who:
The proposed filter should be most beneficial when there is a need for a very small FPR. Typically this happens when the penalty of a false positive is very big compared to the filter test time (database on the disk), and when true positives are rare.
Integrate a new type of filter policy: Paired Block Bloom Filter