xlab-uiuc / slooo

Slooo: A Fail-slow Fault Injection Testing Framework
11 stars 1 forks source link

Can I ask people to use Slooo? #15

Closed tianyin closed 2 years ago

tianyin commented 2 years ago

@varshith15 @Essoz

I'm teaching a grad-level course on reliable software systems. I want to design an assignment on testing quorum systems.

Can I ask the students to use your framework?

tianyin commented 2 years ago

Note that the assignments will not only for fail-slow faults, but also fail-stop faults, like killing a follower or a leader.

shuaimu commented 2 years ago

I think that is a great idea!

Essoz commented 2 years ago

I think the answer would be yes even if fail-stop faults are to be considered.

Just in case your expectation went beyond the tool's capability, I want to clarify that the tool cannot test a system on its own. A user has to use a benchmark and adapt the tool to the target system by implementing the interface provided by the RSM class. And currently, the tool does not support killing a node (but of course we can add support for that with little effort).

tianyin commented 2 years ago

@Essoz I totally understand that!

The goal is to ask students to read some systems code and measure their fault tolerance.

of course we can add support for that with little effort

Are you able to add the support, say in two or three days?

tianyin commented 2 years ago

Also, the assignment will be done on their local machines, rather than Azure.

Therefore, we need to make sure the local mode can be used.

Essoz commented 2 years ago

I can add that local support in two days, say tomorrow.

The local mode is usable, but it comes with limitations. To perform disk experiments, each instance has to be assigned a different partition as its datapath. Memory & Network experiments require extra work on the user side: they have to start instances with resource isolation by using tools like docker so that the experiments such as memory contention do not affect other instances.

tianyin commented 2 years ago

I don't think I understand.

Memory & Network experiments require extra work on the user side: they have to start instances with resource isolation by using tools like docker

Why memory can't be done by cgroup?

Essoz commented 2 years ago

I apologize. I just checked the paper and the code, and the limitations of local mode are listed below:

Memory contention is available.

tianyin commented 2 years ago

Great @Essoz ! CPU and memory are all what I need!!

What I hope you can support is as follows:

The node can be either a leader or a follow, or both (e.g., in Copilot).

varshith15 commented 2 years ago

@tianyin the slooo framework provides the code to inject various slowness (CPU, memory, network) to the desired node (leader/follower) but the user has to add the code to figure out the desired node (leader/follower) for the given system because the logic for figuring out leader/follower is different for different systems like we see when compared to mongo and tidb and rethink and like in copilot no leader and follower, so the user has to implement that part of the code(logic).

In copilot, we just slow down any one of the nodes as there isn't a leader/follower distinction.

tianyin commented 2 years ago

but the user has to add the code to figure out the desired node

This exactly is the purpose of the course assignment -- students need to write some code and understand the system to be measured!