db_bench: allow restricting the range of keys for a read benchmark to the range of database keys

speedb-io / speedb

A RocksDB compliant high performance scalable embedded key-value store

https://www.speedb.io/

Apache License 2.0

912 stars 71 forks source link

db_bench: allow restricting the range of keys for a read benchmark to the range of database keys #101

Closed isaac-io closed 2 years ago

isaac-io commented 2 years ago

Currently db_bench doesn't allow controlling the range of the keys that's being read during a read workload, so for the new paired bloom filter (#29) this causes the workload to bypass the filter completely in case the keys aren't in the range of the data in the database.

Add an option to restrict the key generation so that all of the keys are generated in the range during a read workload, so that the filter paths will be hit and we would be able to measure the impact of the changes in a real world scenario.

udi-speedb commented 2 years ago

@isaac-io & I have discussed and agreed on the following: We will add a new configuration parameter to db_bench. That parameter will allow the user to set the range of random keys to be used in benchmarks such as fillrandom and readrandom. The default of the new parameter will have the range equal to the number of keys, which is the current behaviour => no change of behaviour by default.

isaac-io commented 2 years ago

Note that currently db_bench simply divides the key space between threads evenly, so care should be taken to divide the range between the threads, rather than the amount of keys for the benchmark as is done today.

EDIT: I seem to have confused db_bench and db_stress. db_bench doesn't need to track expected values, so it doesn't divide the key space between the threads as db_stress does.

udi-speedb commented 2 years ago

Following a discussion with @isaac-io, it seems db_bench already has 2 existing parameters that users may use to achieve the same purpose: 'reads' / 'writes'. These parameters, when specified, control the number of keys (when not specified, the number of keys is set by the 'num' parameter. So, a user may specify both 'num' and 'reads' / 'writes'. The 'num' will be used to control the range of keys and the 'reads' / 'writes', their number.

isaac-io commented 2 years ago

Can we close this issue then? Should we run the paired bloom filter benchmark with these settings in order to ensure that it works before we close?

udi-speedb commented 2 years ago

@erez-speedb - Could you please try to use these parameters and see if indeed these parameters enable us to get what we want?

erez-speedb commented 2 years ago

With num=$(($rows 10000)) readrandom : 2.350 micros/op 1702456 ops/sec; 0.0 MB/s (1259 of 19126999 found) With reads=$(($rows 10000)) readrandom : 8.687 micros/op 460468 ops/sec; 75.5 MB/s (3279451 of 5186999 found) @udi-speedb using the "-reads" flag is good enough and the test was updated accordingly. Please consider reverting the db_bench change.

isaac-io commented 2 years ago

Verified as working with the existing parameters.