tikv / fail-rs

Fail points for rust
Apache License 2.0
338 stars 40 forks source link

support configuring fail point in RAII style #62

Closed tabokie closed 2 years ago

tabokie commented 2 years ago

It is very easy to write failpoint tests without freeing up the failpoint in the end.

Signed-off-by: tabokie xy.tao@outlook.com

tabokie commented 2 years ago

@BusyJay What do you think

BusyJay commented 2 years ago

Why not use FailScenario?

tabokie commented 2 years ago

User might not want to configure a global hook that applies to all failpoint tests (like we do in TiKV). If they don't do that, they have to add additional two lines for every tests. FailGuard appears easier to use.

I already used similar thing in Raft Engine (https://github.com/tikv/raft-engine/blob/1a9dd43a74a6ff93c176670e584db6ec520b6953/tests/failpoints/test_io_error.rs#L155), it's really natural to use.

Also, with respect to TiKV's implementation, calling teardown after the test closure is executed is not safe. Some threads might already be joined (and blocked on failpoint) inside the test closure (via custom Drop implementation).

BusyJay commented 2 years ago

User might not want to configure a global hook that applies to all failpoint tests (like we do in TiKV

Actually, TiKV configures global hook using its own way.

I already used similar thing in Raft Engine (https://github.com/tikv/raft-engine/blob/1a9dd43a74a6ff93c176670e584db6ec520b6953/tests/failpoints/test_io_error.rs#L155), it's really natural to use.

Your usage is wrong. Technically, there is only one global failpoint registry, so without synchronization, failpoint configuration set by any cases will affect other cases as well. Hence executing cases in sequential order is required.

Some threads might already be joined (and blocked on failpoint) inside the test closure (via custom Drop implementation).

It doesn't matter. Teardown will wake up all blocking threads.

tabokie commented 2 years ago

Actually, TiKV configures global hook using its own way.

I'm saying user might not want to go over the trouble to doing a configuration like this.

Your usage is wrong.

You are looking at the wrong thing. I'm demonstrating that using FailGuard can make one test case much easier to manage. The failpoint tests in Raft Engine is all run with one threads (and documented).

I see that there are two things with failpoint tests. (1) failpoint cleanup (2) exclusive ownership to global registry. FailGuard only offers a new way to deal with (1).

It doesn't matter. Teardown will wake up all blocking threads.

It does matter. As I explained, the object destruction can be called inside the test closure. That means the process will be blocked inside without getting a chance to call the teardown outside the closure.

To see it in action, I have observed two cases that appear to be caused by forgetting to remove failpoints after test: https://github.com/tikv/tikv/pull/12799/commits/3767f2be9736868ab495d9492853ab7633b11470

tabokie commented 2 years ago

@BusyJay I can't merge this.