speedb-io / speedb

A RocksDB compliant high performance scalable embedded key-value store
https://www.speedb.io/
Apache License 2.0
902 stars 69 forks source link

WriteBufferManager: allow slowing down writes based on total memory usage #114

Closed isaac-io closed 1 year ago

isaac-io commented 2 years ago

Currently the WriteBufferManager (WBM) does not allow slowing down writes in response to memory usage getting close to the prescribed limit. The only mechanism for letting flushes catch up is stopping writes completely by passing true for the allow_stall parameter of the WBM constructor. This can lead to oscillation between full write rate and complete stop, which is undesirable as it affects the latency of user writes.

Allow slowing down writes based on memory consumption by having the WBM signal the WriteController (WC) (there can be more than one) of the delay requirement. The delay requirement is stored in the WC as part of the Global Delay feature (#346). The same way a CF signals the WC that it has a delay requirement.

To enable this feature:

  1. allow_delays_and_stalls = true . in the ctor of WriteBufferManager (previously this flag was named allow_stalls)
  2. use_dynamic_delay = true .

The way the delay requirements are calculated is as follows:

The WBM reports a delay once its memory consumption passes a certain threshold from the quota. That threshold can be controlled by passing start_delay_percent to the ctor of the WBM. The default value is 70. Which means that the WBM will start issuing delay requests once the memory consumption of the WBM reaches 70% of its quota. The delay is linear throughout the range from threshold to the max quota. The range from start of delay to the quota is divided into 99 steps of delay. (kMaxDelayedWriteFactor - 1). E.g. in the 1st step, the delay requirement will be 99/100 max_delayed_write_rate() and the last step (when the memory almost reached the quota) will result in a delay requirement of 1/100 max_delayed_write_rate(). max_delayed_write_rate() is the rate the user passed to delayed_write_rate (DBOptions) which can also be dynamically changed.

note:

The stall logic in the WBM is redundant since the WC already includes logic for stopping writes which can be reused. For the first phase, #423 , keep using the stall logic in the WBM (ShouldStall() and WBMStallWrites()) and only add the mechanism for slowing down writes.

udi-speedb commented 2 years ago

Pull request: https://github.com/speedb-io/speedb/pull/164

udi-speedb commented 1 year ago

@Guyme The ticket as was actually completed but abandoned. So, I suggest creating a new ticket once we actually know what we want to do here, and how it fits with other delayed write activities we work on.

Yuval-Ariel commented 1 year ago

@erez-speedb , plz make sure theres no degradation with branch - dirty-mem-connect-wbm-to-global-delay. i'll run the performance scenario which shows benefit.

erez-speedb commented 1 year ago

Perf test passed, same performance and memory consumption as 2.4.1 All test were done with WF disabled.

Yuval-Ariel commented 1 year ago

comparing main branch (e7e2de7d75cba503c301397a9681f861467c67d3) vs this branch (ba6a3de5336936e3d0b080be58bd75ec8c3749a9) cmd: ./db_bench --compression_type=None -db=/data/ -num=200000000 -value_size=1000 -key_size=16 --delayed_write_rate=536870912 -report_interval_seconds=1 -max_write_buffer_number=4 -num_column_families=6 -histogram -max_background_compactions=8 -cache_size=8388608 -max_background_flushes=1 -bloom_bits=10 -benchmark_read_rate_limit=0 -benchmark_write_rate_limit=0 -report_file=fillrandom.csv --disable_wal=true --benchmarks=fillrandom,levelstats --db_write_buffer_size=1073741824 --allow_wbm_stalls=true --use_spdb_writes=false --initiate_wbm_flushes=false -write_buffer_size=134217728

results: main and Dirty PR

quantitative - almost 50% improvement in stability: ops/sec (std) main - 89981 this branch - 49333

same Ops/sec (mean) main - 255700 this branch - 245854

platform azure standard_L16s_V3 instance "cpu": "Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz", "num_cpu": 16, "memory": "128G" "disk": single 1.8T NVME