google / gvisor

Application Kernel for Containers
https://gvisor.dev
Apache License 2.0
15.3k stars 1.27k forks source link

Chaos Engineering with gvisor #1016

Open seeker89 opened 4 years ago

seeker89 commented 4 years ago

Hello there,

I wanted to ask whether there were any plans and/or appetite to include fault injection into this kernel? I'm interested in Chaos Engineering and this userspace kernel looks like a perfect place to expose some way to introduce failure.

In particular, I think that just adding fine-controlled delays and/or a certain amount of errors on particular syscalls would allow for a lot of valuable testing.

From reading the code, it looks like adding a layer like that would be both reasonably easy and cheap.

Please tell me if that's something that you'd be interested in accepting as a PR, and if you are, I'll follow up with a more detailed plan.

Also, the dev mailing list seems to be dominated with the automated messages, so I figured that an issue would be a better way to put this forward.

Thanks !

And btw, gvisor is pretty awesome, thanks for the hard work that went into it!

ianlewis commented 4 years ago

https://github.com/google/syzkaller can be used to fuzz gVisor and we use that a lot to find bugs. We'd be open to other plans or ideas you might have on adding additional testing.

BTW, we've pulled back from sending automated messages on the mailing list so feel free to submit questions there!

seeker89 commented 4 years ago

Thanks for taking the time to answer @ianlewis !

Kernel fuzzing is cool, but I had a different use case on my mind all together. Let me try to be a little more clear.

Chaos Eng

When doing Chaos Engineering, I'm typically interested in seeing how some black box behaves when a failure (for example a delay on networking; a failure to write to disk; lack of resources; etc) is injected, and to confirm that my hypothesis about how it should react matches the reality. (There is also an application-level fault injection that's interesting, but outside of scope in here).

For example, let's say that I have a 3 node cluster of etcd. I could come up with a number of hypotheses. For instance (completely made up on the spot here, just to give you an idea).

Example 1

At level of traffic X, I can lose an instance and the cluster will keep performing without degradation.

To verify that, I can generate the traffic, automate killing one of the instances, and observe whether the cluster does indeed carry on working well. Easy, so far so good.

Example 2

If 1% of writes/fsyncs/etc gets slowed down by X ms, I still satisfy my SLO for my cluster.

To verify that, I would like to introduce the delay/failure to the corresponding syscall, and observe how my application works.

Assuming linux, I would probably use strace, and 1) pay a rather large performance tax (which probably means I won't really do it anywhere near a production workload), and since I'm talking to the kernel running my OS, theoretically 2) the blast radius is the whole OS. For some things I could also use these built-ins.

The idea

Now, if running the same cluster on gvisor, we could implement the failure injection mechanism directly into this kernel, and allow people to inject failure for their chaos engineering experiments in a cheap(er) and potentially easier way.

Specifically, since every sandbox runs its own userspace kernel, to address 1) we can implement this cheaply (an extra if statement in a naive approach, or a modified syscalls table, where a 'delayed' syscall handler is introduced for example), and 2) the blast radius is limited to the particular sandbox.

Having this modern userspace kernel here, and already paying the performance tax of running two kernels, it just feels like an amazing spot where we could chaos engineer a lot of things. I could imagine a really nice API to call to inject very precise amount of failure in very precise spots.

Tell me whether that makes more sense now 👍

nlacasse commented 4 years ago

Hi Mikolaj,

This is something we've thought about, but have not had time to actually work on. I think it is a very interesting idea! It would be pretty easy to inject errors and timed waits into syscalls. We'd just need to wrap (some of) the syscall functions in the syscall table with "chaos" functions.

https://github.com/google/gvisor/blob/master/pkg/sentry/syscalls/linux/linux64_amd64.go#L42

The main concern would be to make sure that this does not introduce any performance overhead when running without chaos. Also, it would be nice to have some type of pluggable interface, so that users could define and configure their own unique types of chaos without having to modify the gVisor source itself.

I think this could be a good use case for a Go Plugin: https://golang.org/pkg/plugin/

We could define a "syscall-wrapper" interface that a plugin could implement. If gVisor is run with such a plugin, then it could wrap all syscall handlers with it. This would make it easy to support different types of chaos, and possibly other non-chaos use cases as well.

I think this would be a very cool feature for gVisor, and I'd be happy to review a design and PRs.

seeker89 commented 4 years ago

Hi Nicolas,

Thanks for your response, and I'm glad that you find the idea interesting!

I wasn't aware fo the Go Plugin, and I quite like the modularity of your suggestion.

That said, I think that what could really make it a killer feature, would be some kind of dynamic API, be it gRPC, HTTP or else, which would allow for dynamic injection of (potentially timed) failure.

For instance, if I was able to have a running application, and then issue a request to the kernel saying "add this delay with this distribution and this jitter for the next 5 seconds on this syscall", then I would be able to easily automate sophisticated scenarios. I would personally use gvisor just to be able to do that, without the overheads of ptrace and with the increased ease of use.

Of course, then the access to the api would have to be protected, and the api itself should not start by default, etc., all the usual concerns.

What do you think ?

nlacasse commented 4 years ago

Yeah, I think that all sounds very good, and is still possible with the Plugin model. The plugin is just a dynamic library loaded at runtime with a specific interface. It could still start a gRPC server, and determine how much jitter to add based on whatever requests it has received.

The plugin allows more flexibility, and prevents us from needing all of the chaos logic inside the main gVisor code base. It would also avoid the concerns you mention about protecting the API, and not starting it by default, etc.

seeker89 commented 4 years ago

Awesome. I'll have a think about how to best structure it, and I'll update this issue with a first draft of a plan once I get a good idea. Thanks!

ianlewis commented 4 years ago

Probably the plugin could be invoked when registering the syscall table in the kernel and the wrapper could be added (or not) at that point.

https://github.com/google/gvisor/blob/add40fd6ad4c136bc17d1f243ec199c7bf37fa0b/pkg/sentry/kernel/syscalls.go#L292