yuqi1129 commented 2 years ago

Signed-off-by: yuqi1129 yuqi4733@gmail.com

yuqi1129 commented 2 years ago

@tabokie Can you help to review this PR , unit test will be added later. I have some problem about this issue

Is this PR is on the right way ? if not, can you give me some advice, help wanted
Parameter raft_groups have not been used in unsafe_truncate_with_file_builder, i'm thinking how to use it and i have no idea whether we should use it in the replay method.
In the replay method, I'am using rename original file to tmp and move new file to the original file. this seems to be not atomic and we can't guarantee both is ok
how to deal with the original file if we successfully truncate it? just remove it?

yuqi1129 commented 2 years ago

I can't clearly understand the truncate mode front, if a file corrupted at beginning, we can't get the position where corrupt contents ends and normal logs starts

codecov[bot] commented 2 years ago

Codecov Report

Merging #175 (517f3b7) into master (7842aad) will decrease coverage by 0.16%. The diff coverage is 90.82%.

@@            Coverage Diff             @@
##           master     #175      +/-   ##
==========================================
- Coverage   96.97%   96.80%   -0.17%     
==========================================
  Files          26       27       +1     
  Lines        6703     7172     +469     
==========================================
+ Hits         6500     6943     +443     
- Misses        203      229      +26

Impacted Files	Coverage Δ
src/file_pipe_log/mod.rs	`97.34% <ø> (ø)`
src/lib.rs	`100.00% <ø> (ø)`
src/truncate.rs	`85.76% <85.76%> (ø)`
src/engine.rs	`96.60% <88.88%> (+1.90%)`	:arrow_up:
tests/failpoints/test_engine.rs	`99.08% <96.94%> (-0.92%)`	:arrow_down:
src/consistency.rs	`91.11% <100.00%> (ø)`
src/file_pipe_log/pipe.rs	`99.11% <100.00%> (+<0.01%)`	:arrow_up:
src/file_pipe_log/pipe_builder.rs	`91.93% <100.00%> (+0.08%)`	:arrow_up:
src/memtable.rs	`99.51% <100.00%> (+<0.01%)`	:arrow_up:
src/pipe_log.rs	`94.73% <100.00%> (ø)`
... and 3 more

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 7842aad...517f3b7. Read the comment docs.

tabokie commented 2 years ago

truncate recovers the data files to a bootable, consistent state. For a raft group with a missing log entry, we can accomplish that either by removing all the entries older than it ("front" mode), or removing all entries newer than it ("back" mode), or removing the whole raft group ("all" mode).

And yes, raft group id is needed, because the user might adopt different modes for different groups.

The code you just submit isn't good enough. You have to identify the "hole" in log entries by maintaining a map (id -> last_index) in TruncateMachine.

About the file rename issue, you can directly rename from the new filename to the original filename. This operation is atomic, and the old file is deleted.

yuqi1129 commented 2 years ago

@tabokie

truncate recovers the data files to a bootable, consistent state. For a raft group with a missing log entry, we can accomplish that either by removing all the entries older than it ("front" mode), or removing all entries newer than it ("back" mode), or removing the whole raft group ("all" mode).

The unit is a file or a raft groups? , If we meet corruption in a file and we choose front mode, Does this mean that we should drop all the message in the file that older than the corrupted message? Another issue. Truncate action needs to guarantee that files name stay the same ? that is if there are 2 files A.write B.write After we execute the truncate command. there should be 2 files and one is A.write thats contains the content of A.write before? or we can merge this two file into one says C.write. I'am very puzzled at this point.

And yes, raft group id is needed, because the user might adopt different modes for different groups.

If user only interested in some raft groups, that means we only need to care about that groups? what if corruptions lies in other raft groups when we visited log files, how do we handle this scenarios

The code you just submit isn't good enough. You have to identify the "hole" in log entries by maintaining a map (id -> last_index) in TruncateMachine.

You mean we should check whether the log index is sequential, that is some log has lost. For example, log index of a raft groups is 1, 2,3, 5 and 4 is lost, if the mode is front, we should drop 1,2,3 ?

I'am looking forward your replay, thanks

tabokie commented 2 years ago

We scan for all data files, but only modify specific raft groups. Only log entries in the corrupted raft group can be dropped.
File name must be the same, because we assume file numbers are consecutive during recovery. This mean we will even write an empty file if all entries inside are truncated.
Physical corruptions are already handled in replay method (via TolerateAnyCorruption mode), logical corruptions in unrelated raft groups will not be detected or handled.

yuqi1129 commented 2 years ago

Physical corruptions are already handled in replay method (via TolerateAnyCorruption mode), logical corruptions in unrelated raft groups will not be detected or handled.

That is exactly what makes me confused, as you have mention, Physical corruption have been handle, so truncate subcomand only needs to handle logical corruption. Because if there is physical corruption in a file, we can do nothing but drop the rest of it. As for logical corruption, say the log of one raft groups in a file is: 1,2, 3 ,4 6, 7 ,8 (5 is missing)

if mode is front, we should keep 1,2,3,4
if mode is back we should keep 6,7,8
if mode is all, we would drop all logs in the file of a specific raft groups

That' is quite clear, @tabokie Is there anything wrong with my unstanding?

tabokie commented 2 years ago

@yuqi1129 Your understanding is correct.

yuqi1129 commented 2 years ago

@tabokie Can you take time to review this PR?

yuqi1129 commented 2 years ago

@tabokie All discussion has been solved, can you help to view again?

yuqi1129 commented 2 years ago

@tabokie A new commit has been pushed, if you have time, please to take a look to the latest code, thank you.

tabokie commented 2 years ago

Closed in favor of #182

tikv / raft-engine

support `truncate` subcommand #175

Codecov Report