speedb-io / speedb

A RocksDB compliant high performance scalable embedded key-value store
https://www.speedb.io/
Apache License 2.0
856 stars 63 forks source link

CancelAllBackgroundWork: Flush will not wait for stall conditions to clear #832

Open Yuval-Ariel opened 5 months ago

Yuval-Ariel commented 5 months ago

When closing the db, CancelAllBackgroundWork is called and a flush is initiated when there are unpersisted writes (without wal - has_unpersisteddata is true). This flush currently waits for some stall conditions to clear (in WaitUntilFlushWouldNotStallWrites). The wait is controlled by FlushOptions.allow_write_stall and is false by default.

Such behaviour does not seem logical since the user simply wants to close the db and they should not be forced to wait for stall conditions.

In cases such as db_bench, this leads to some obscurity by sometimes hiding the compaction debt that was accumulated during the test. This hiding effect is caused when the test ends in a stall condition and it has unpersisted data. As mentioned above, new compactions will be executed since CancelAllBackgroundWork is waiting until the flush ends which in turn waits for the stall conditions to clear. A different test which has a lower compaction debt will close in much shorter time and the benefit of having a lower compaction debt will be overlooked

Solve this by allowing the flush to run despite the stall conditions and thus only currently running bg operations will be executed and not new ones.

Results:

fillup cmd: db_bench --compression_type=None -db=/data/ -num=60000000 -value_size=1000 -key_size=16 -report_interval_seconds=1 -stats_dump_period_sec=120 -num_column_families=1 -write_buffer_size=67108864 -histogram --delayed_write_rate=536870912 -max_write_buffer_number=4 -db_write_buffer_size=0 -max_background_compactions=8 -cache_size=8388608 -max_background_flushes=4 -bloom_bits=10 -benchmark_read_rate_limit=0 -benchmark_write_rate_limit=0 -report_file=fillrandom.csv --disable_wal=true --benchmarks=fillrandom,levelstats,memstats,stats -max_write_buffer_number=8

It takes 3 secs for Shutdown (" Shutdown: canceling all background work" in the LOG) in this branch and almost 40 secs in main.

lsm state when CancelAllBackgroundWork is called in fillup:

main: files[33 62 39 420 71 0 0]

this branch: files[33 22 28 428 86 0 0]

lsm state at start of next test:

main: image

this branch: image

As can be seen, the main branch has done much compaction between the two tests which is unaccounted for in the performance measurements.

udi-speedb commented 5 months ago

@Yuval-Ariel -

  1. What about cases where there is a compaction debt but it is not big enough to trigger a stall condition? Wouldn't we close the DB while still having a compaction debt, only smaller?
  2. CancelAllBackgroundWork() is exposed in the C interface, the JAVA interface and is called directly from multiple locations, in addition to the Close() API. Although it makes sense to have db_bench tests complete all compactions before closing the DB and ending the benchmark, I feel it is problematic to change that internally as you have (unconditionally, without user control). I suggest adding a new boolean parameter with a default value of false (to retain the current behavior by default). When the new parameter is true, Close() would wait for all of the compactions to complete before returning control to the user.
Yuval-Ariel commented 5 months ago

@udi-speedb

  1. The goal here is to prevent compactions running at db close when the flush is waiting for stall conditions. So that close is as fast as possible and the user can still ask to wait for bg work to finish by passing true to CancelAllBackgroundWork.
  2. i've opened https://github.com/facebook/rocksdb/issues/12346