Closed royguo closed 2 years ago
Yes, this is something I've been thinking about as well. One solution i would like to try out is to prioritize wal writes by reserving at least one available(empty or zones without active writes) for wal allocation. Or perhaps even better, a configurable parameter that can tell zenfs that the user wants at least N MBs of WAL space. There will be backpressure eventually if the compaction can't keep up, but it should remove the latency spikes.
I don't know what you mean by random read zone, all zones can be randomly read. Could you explain?
Hi, @yhr
We've patched a simple solution for now:
Sorry for the random read question, I misremembered, I thought we should open a zone before reading from it, LOL, just ignore it.
Besides that, we also find that the RocksDB does not close WAL files explicitly by default, which may take too many open zone resources during a heavy workload benchmarking.
In our case, we find there are almost 6~7 WALs were opened at the same time(though only one file is able to be written), the other WAL files keep opening there until RocksDB Sync/Close them in the background (We just fixed this in TerarkDB).
For RocksDB, though it has the same problem (WAL files are not closed immediately), it doesn't matter in ext4 since it takes no extra resources.
But this is not the first priority, we handled it by some hack fix in RocksDB/TerarkDB for now.
We've two temporary solutions that both work for us, we selected the first one for now:
FYI, this is one of our attempts to reduce allocation latency. https://github.com/bzbd/zenfs/pull/19
Thanks, I'm still off on paternal leave but i'm hoping to look over all the issues reported (and the suggested fixes) properly and figure out the best way forward. I think RocksDB should be able to close the WALs not being written to (so that may be a bug in RocksDB). I plan to address the latency issue in the allocator by introducing a background thread. Let's keep this issue open until i've created new issues discussing and tracking that work.
Thanks!
@yhr A background thread seems to be the final solution, please go ahead. (our solution, for now, is simply re-arrange allocation locks).
And even we allocate zones in the background thread, we should also make sure WAL file can have higher priority to get a zone from the background zone queue.
For the RocksDB WAL file problem, RocksDB doesn't close the file foreground immediately but leaves it to a background flush thread(which will close all existing WAL files periodically), I think it is not a bug, because for cases that have WAL sync = false
, it is true that no need to close it immediately, just FYI.
Since we always need to use WAL sync = true, so we move the WAL close() logic to the foreground and let the original background thread ignore WAL close action. But this should be considered in your solution since this problem will make RocksDB keeps a lot of WAL open for quite a few seconds.
Again, no hurry since we've fixed it temporarily, good luck.
By the way, the latency problem might be easier to observe when:
My plan is to do this work in three steps:
The allocator has been reworked as part of https://github.com/westerndigitalcorporation/zenfs/pull/114 , closing this :)
Background:
Problems Description:
Expectations: