netblue30 / firejail

Linux namespaces and seccomp-bpf sandbox
https://firejail.wordpress.com
GNU General Public License v2.0
5.83k stars 568 forks source link

gVisor backend #3942

Open ghost opened 3 years ago

ghost commented 3 years ago

gVisor emulates the majority of linux syscalls in userland, providing a respectable sandbox.

gVisor provides a runtime (runsc) capable of running OCI spec containers. https://gvisor.dev/docs/user_guide/quick_start/oci/

It should be possible to either modify gVisor to accept a different interface or to have firejail output an OCI config for an OCI runtime.

gVisor already has something that may be a starting point:

sudo runsc do echo ok

do [flags] - runs a command.

This command starts a sandbox with host filesystem mounted inside as readonly, with a writable tmpfs overlay on top of it. The given command is executed inside the sandbox. It's to be used to quickly test applications without having to install or run docker. It doesn't give nearly as many options and it's to be used for testing only. -cwd string path to the current directory, defaults to the current directory (default ".") -ip string IPv4 address for the sandbox (default "192.168.10.2") -quiet suppress runsc messages to stdout. Application output is still sent to stdout and stderr -root string path to the root directory, defaults to "/" (default "/")

topimiettinen commented 3 years ago

I think virtualization is one of the missing pieces of Firejail. It would allow much better system call interception than seccomp (which can't dereference pointers), even very low level operations like filtering of CPU instructions or messing with page tables. Some Spectre-type attacks could be prevented by flushing caches and inserting a random small delay at every system call. This would be too expensive for all applications but maybe acceptable for Firejail. Detection of ROP, JOP or Spectre-class attacks would be awesome. Application checkpoint/restore would be nice too. Windows seems to use VMs to virtualize processes at OS level, so Linux is behind here.

Running runsc or writing OCI config files seems to be way too high level interface. Instead the interface should be a C library with very detailed control. System call intercepting should be done by Firejail, for example gVisor could call a plugin provided by Firejail. gVisor also duplicates sandboxing functions already implemented in Firejail, like network or file system filtering features, but perhaps that can be ignored.

topimiettinen commented 3 years ago

I opened a feature request for gVisor: https://github.com/google/gvisor/issues/5440

ghost commented 3 years ago

gVisor is intriguing because it's a far better sandbox than firejail, bwrap, nsjail or any others. Unfortunately the OCI spec is quite lame, but I think I will use it anyway.

Note that it doesn't appear to implement unix sockets correctly, can't get the x11 or wayland sockets to work even with --fsgofer-host-uds. So it would appear to be restricted to non-graphical applications.

gVisor in contrast to seccomp does not pass any syscalls, it interprets them. Those it can service itself it does, others it services by calling syscalls itself but there is never direct pass-through. Which is far superior isolation to seccomp. It uses seccmp on itself too of course to restrict itself from calling arbitrary syscalls should it be compromised.