Closed tabokie closed 2 years ago
Merging #211 (76616d3) into master (563a60d) will increase coverage by
0.29%
. The diff coverage is99.34%
.
@@ Coverage Diff @@
## master #211 +/- ##
==========================================
+ Coverage 96.95% 97.25% +0.29%
==========================================
Files 28 29 +1
Lines 7724 8741 +1017
==========================================
+ Hits 7489 8501 +1012
- Misses 235 240 +5
Impacted Files | Coverage Δ | |
---|---|---|
src/lib.rs | 100.00% <ø> (ø) |
|
tests/failpoints/mod.rs | 100.00% <ø> (ø) |
|
src/swappy_allocator.rs | 99.23% <99.23%> (ø) |
|
src/config.rs | 96.49% <100.00%> (+0.37%) |
:arrow_up: |
src/engine.rs | 97.21% <100.00%> (ø) |
|
src/memtable.rs | 99.15% <100.00%> (+0.03%) |
:arrow_up: |
src/metrics.rs | 97.65% <100.00%> (+0.07%) |
:arrow_up: |
src/purge.rs | 97.43% <100.00%> (ø) |
|
src/util.rs | 88.85% <100.00%> (+0.86%) |
:arrow_up: |
tests/failpoints/test_io_error.rs | 100.00% <100.00%> (ø) |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update 563a60d...76616d3. Read the comment docs.
Security is not considered in this PR. All writes to disk should be encrypted if encryption is enabled.
Security is not considered in this PR. All writes to disk should be encrypted if encryption is enabled.
I have thought about that. And I think it's not a problem because entry index only contains file offsets but no user data.
I built a POC of offloading partial entries instead of whole to disk. Compared to this PR, it has the following advantages:
It has one disadvantage though: it can't give a precise control on memory. However, because it offloads all applied entry indexes to disk, so there should be a limited index in memory. It can also choose to offload all committed entry indexes, which will only need to keep at most 100MiB in memory.
I built a POC of offloading partial entries instead of whole to disk. Compared to this PR, it has the following advantages:
There are three weeks before code frozen. I don't trust the quality of your bloated data structure can be well tested. My allocator, due to its inherent simplicity (both in design and in interaction with other components), is already well tested (1000 LOC of tests + all existing tests).
I agree some of your points in general, but:
The points that I do agree, are also not that important. Performance (especially read) under memory throttling has never been a priority. Yes, better performance is good to have, only if I'm comfortable with the cost. But looking at your code, I'm not comfortable.
1. Your code is **not** simple. I think anyone can see that.
I don't see why the code is not simple. The most complexity comes from making BacklogIndex
and in memory index behave as a unified interface. And these part can be well tested by unit tests.
As I pointed out in previous comment, most read IOs are caused by rewrite process, and hit primarily cold entries (applied entries). And your code will be even more complex to support that.
So why applied entries can be rewritten? Applied means it's immutable, any rewrite is wrong.
But looking at your code, I'm not comfortable.
The same feeling I have about allocator. I strongly object to use an allocator.
I don't see why the code is not simple.
You said it yourself. "behave as a unified interface", which echoes my comment emphasizing that implementing our own VecDeque
is complex. It's not the interface that is complex of course, it's the internal states and branches that are complex, and this complexity cannot be hidden. Last time I spent half month of intensive working to reason and cover nearly every conditional branch in this codebase.
Allocator is simple, because it doesn't tamper with any of the data structure and their interactions. Its methods adhere to a strict calling protocol. The combination of method calling is finite.
So why applied entries can be rewritten? Applied means it's immutable, any rewrite is wrong.
Rewrite logic hasn't changed much since I took over this repo. And I suspect any other implementations can do away with compactions.
The same feeling I have about allocator. I strongly object to use an allocator.
I guess we can't persuade anyone with more discussions. You can put it to a vote if necessary.
Rewrite logic hasn't changed much since I took over this repo. And I suspect any other implementations can do away with compactions.
Now I see your point. Instead of introducing backlog index, index may need to be tracked base on file instead of region. That is, region maintains a list of log files that contain its entry, and every log file has a region indexes queue. So that rewrite is just updating log file list. Though it's certainly a big refactor.
Now I see your point. Instead of introducing backlog index, index may need to be tracked base on file instead of region. That is, region maintains a list of log files that contain its entry, and every log file has a region indexes queue. So that rewrite is just updating log file list. Though it's certainly a big refactor.
So a more LSM-like approach. With a proper block cache it might work better than the current implementation. And it will cut a lot of complexity from memtable to make it easier to maintain. I'll file an issue later.
Signed-off-by: tabokie xy.tao@outlook.com
Ref #205
An allocator that switches to file-based mmap when memory usage is above certain watermark.
Micro-benchmarks:
buffered write (
./target/release/stress --path /data3/re_bench --time 120 --regions 1000 --write-threads 1 --write-without-sync
):write (
./target/release/stress --path /data3/re_bench --time 120 --regions 1000 --write-threads 1 --purge-interval 0 --compact-count 0
):Test results on nvme disk: