Warn on static binaries + seccomp

commial commented 3 years ago

Hi there,

From what I understand on how firejail is working:

(from documentation):

if the blocked system calls would also block Firejail from operating, they are handled by adding a preloaded library which performs seccomp system calls later.
This preloaded library is added in https://github.com/netblue30/firejail/blob/master/src/firejail/fs_trace.c#L105
Its source code is https://github.com/netblue30/firejail/blob/master/src/libpostexecseccomp/libpostexecseccomp.c

Long story short, to seccomp system call such as execve, firejail is injecting code through LD_PRELOAD mechanism. As a result, for static binaries, this is ignored, and the resulting process will be able to execve.

It would be nice to have a warning that the seccomp will not be honored, or even an opt-in option to avoid these behavior (ie. exit instead of launch the binary), for use cases where Firejail is use to sandboxed untrusted binaries. I don't know if another mechanism (like using ptrace) could be used to actually circumvent this behavior.

To reproduce:

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

int main() {
    char *newargv[] = { "/bin/ls", "/", NULL };
    char *newenviron[] = { NULL };

    execve("/bin/ls", newargv, newenviron);
    exit(EXIT_FAILURE);
}

# dynamic version
$ firejail --noprofile --shell=none --seccomp=execve ./exec
Parent pid 14822, child pid 14823
Post-exec seccomp protector enabled
[...]
(execve is prevented)

# static version
$ firejail --noprofile --shell=none --seccomp=execve ./exec-static 
Parent pid 15030, child pid 15031
Post-exec seccomp protector enabled
Seccomp list in: execve, check list: @default-keep, postlist: execve
Child process initialized in 33.37 ms
bin    core  home        lib    libx32  mnt   root  snap  tmp  vmlinuz
[...]

Parent is shutting down, bye...

topimiettinen commented 3 years ago

Right. I think warning in documentation and at runtime would be appropriate.

Perhaps the problem could be avoided if Firejail executed in these cases a custom loader, which would set up the seccomp filter and then loaded the actual binary like kernel would do for static executables. That couldn't be bypassed.

commial commented 3 years ago

Thanks for your quick answer.

From what I understand, the only seccomp "post-exec" syscalls are the one from "@default-keep":

syscalls_in_list(list, "@default-keep", fd, &prelist, &postlist, native);

Which resolves to:

# etc/templates/syscalls.txt
@default-keep=execve,prctl

Another way to convince ourselves is to ask firejail directly:

$ firejail --noprofile --shell=none --seccomp=$(firejail --debug-syscalls | awk '{print $3;}' | tr '\n' ',') ls
Parent pid 9768, child pid 9769
Post-exec seccomp protector enabled
Seccomp list in: ...
, postlist: execve,prctl

We can also write a tiny program to check for prctl:

printf("secbits = 0x%x => ", prctl(PR_GET_SECUREBITS, 0, 0, 0, 0));

And again:

$ firejail --noprofile --shell=none --seccomp=prctl ./prctl_example
Parent pid 10123, child pid 10124
Post-exec seccomp protector enabled
Seccomp list in: prctl, check list: @default-keep, postlist: prctl
(get killed)

$ firejail --noprofile --shell=none --seccomp=prctl ./prctl_example-static
Parent pid 10135, child pid 10136
Post-exec seccomp protector enabled
Seccomp list in: prctl, check list: @default-keep, postlist: prctl
Child process initialized in 21.85 ms
secbits = 0x0 => []

Parent is shutting down, bye...

So, the problem is for, and only for, prctl and execve.

The custom-loader solution seems a bit overkill to me, and could actually lead to more problems (ELF parsing, etc.). I've tried to look how others solutions circumvent this problem. From what I understand, systemd actually disallow (in the sense: "will always fail") seccomp-ing execve: (from the man page)

Note that strict system call filters may impact execution and error handling code paths of the service invocation. Specifically, access to the execve system call is required for the execution of the service binary — if it is blocked service invocation will necessarily fail

It seems to me that the prctl is kept post-exec to be able to later seccomp execve. But if execve is not expected to be seccomp-ed, prctl could be actually done before the execve, and then working for static binaries. IMHO, that would allow a reduction of a significant attack surface, given prctl possibilities. Am I missing something?

As a side note:

From what I understand, to be able to seccomp execve, one needs to allow some others syscalls, specifically the ones used by the loader and libpostexecseccomp. These syscalls includes, for instance, openat, lseek, mmap, close, ... In such a case, what would be the expected behavior/use case? Disallowing execve but keeping a lot of likely dangerous syscalls (openat + mmap could almost load an external binary)? (I don't have the answer)

topimiettinen commented 3 years ago

I agree that when prctl() needs to be filtered but not execve(), there shouldn't be a need to use libpostexecseccomp.

Filtering open etc indeed breaks a lot of stuff (for example in the dynamic loader before libpostexecseccomp is loaded), so perhaps the list should be more complete. prctl() is needed to install the seccomp filters but it's indeed not the only one.

Systemd and Firejail have different approaches. Systemd is running as PID 1 which is about the most important piece of software in a system besides the kernel, so it's natural that features which could be considered too "hacky" are not very interesting. Firejail instead is in much more flexible position, it's OK if something doesn't work in every case since the feature can be often disabled via per application profiles. In the worst case it's always possible not to use Firejail, but switching PID 1 software (or not using any, init=/bin/sh?) is much more difficult. Blocking execve() with a ld.preload hack would not be OK for PID1, but it's an interesting option for Firejail.

It's of course possible to circumvent blocked execve() with use of other system calls. In the extreme case (fileless malware) attackers don't even need execve() or open() + mmap(), if they only chain enough ROP gadgets to build a simple remote shell or whatever they want to execute.

I think a custom preloader (which wouldn't have to replace the real dynamic loader) could be interesting for other clever uses, after execve() there could be further opportunities for sandboxing. For example, seccomp actions SECCOMP_RET_TRAP and SECCOMP_RET_USER_NOTIF call a function within the thread making the system call, but this could be supplied by the preloader. A custom preloader would be overkill for blocking execve() just for statically linked applications but if it existed, Firejail would be able to install any seccomp filters, even for example SECCOMP_SET_MODE_STRICT which only allows read, write, _exit and sigreturn.

netblue30 / firejail

Warn on static binaries + seccomp #3685