Open commial opened 4 years ago
Right. I think warning in documentation and at runtime would be appropriate.
Perhaps the problem could be avoided if Firejail executed in these cases a custom loader, which would set up the seccomp filter and then loaded the actual binary like kernel would do for static executables. That couldn't be bypassed.
Thanks for your quick answer.
From what I understand, the only seccomp "post-exec" syscalls are the one from "@default-keep":
syscalls_in_list(list, "@default-keep", fd, &prelist, &postlist, native);
Which resolves to:
# etc/templates/syscalls.txt
@default-keep=execve,prctl
Another way to convince ourselves is to ask firejail
directly:
$ firejail --noprofile --shell=none --seccomp=$(firejail --debug-syscalls | awk '{print $3;}' | tr '\n' ',') ls
Parent pid 9768, child pid 9769
Post-exec seccomp protector enabled
Seccomp list in: ...
, postlist: execve,prctl
We can also write a tiny program to check for prctl
:
printf("secbits = 0x%x => ", prctl(PR_GET_SECUREBITS, 0, 0, 0, 0));
And again:
$ firejail --noprofile --shell=none --seccomp=prctl ./prctl_example
Parent pid 10123, child pid 10124
Post-exec seccomp protector enabled
Seccomp list in: prctl, check list: @default-keep, postlist: prctl
(get killed)
$ firejail --noprofile --shell=none --seccomp=prctl ./prctl_example-static
Parent pid 10135, child pid 10136
Post-exec seccomp protector enabled
Seccomp list in: prctl, check list: @default-keep, postlist: prctl
Child process initialized in 21.85 ms
secbits = 0x0 => []
Parent is shutting down, bye...
So, the problem is for, and only for, prctl
and execve
.
The custom-loader solution seems a bit overkill to me, and could actually lead to more problems (ELF parsing, etc.).
I've tried to look how others solutions circumvent this problem. From what I understand, systemd
actually disallow (in the sense: "will always fail") seccomp-ing execve
: (from the man page)
Note that strict system call filters may impact execution and error handling code paths of the service invocation. Specifically, access to the execve system call is required for the execution of the service binary — if it is blocked service invocation will necessarily fail
It seems to me that the prctl
is kept post-exec to be able to later seccomp execve
. But if execve
is not expected to be seccomp-ed, prctl
could be actually done before the execve
, and then working for static binaries. IMHO, that would allow a reduction of a significant attack surface, given prctl
possibilities. Am I missing something?
As a side note:
From what I understand, to be able to seccomp execve
, one needs to allow some others syscalls, specifically the ones used by the loader and libpostexecseccomp
. These syscalls includes, for instance, openat
, lseek
, mmap
, close
, ...
In such a case, what would be the expected behavior/use case? Disallowing execve
but keeping a lot of likely dangerous syscalls (openat
+ mmap
could almost load an external binary)? (I don't have the answer)
I agree that when prctl()
needs to be filtered but not execve()
, there shouldn't be a need to use libpostexecseccomp
.
Filtering open
etc indeed breaks a lot of stuff (for example in the dynamic loader before libpostexecseccomp
is loaded), so perhaps the list should be more complete. prctl()
is needed to install the seccomp filters but it's indeed not the only one.
Systemd and Firejail have different approaches. Systemd is running as PID 1 which is about the most important piece of software in a system besides the kernel, so it's natural that features which could be considered too "hacky" are not very interesting. Firejail instead is in much more flexible position, it's OK if something doesn't work in every case since the feature can be often disabled via per application profiles. In the worst case it's always possible not to use Firejail, but switching PID 1 software (or not using any, init=/bin/sh
?) is much more difficult. Blocking execve()
with a ld.preload hack would not be OK for PID1, but it's an interesting option for Firejail.
It's of course possible to circumvent blocked execve()
with use of other system calls. In the extreme case (fileless malware) attackers don't even need execve()
or open()
+ mmap()
, if they only chain enough ROP gadgets to build a simple remote shell or whatever they want to execute.
I think a custom preloader (which wouldn't have to replace the real dynamic loader) could be interesting for other clever uses, after execve()
there could be further opportunities for sandboxing. For example, seccomp actions SECCOMP_RET_TRAP
and SECCOMP_RET_USER_NOTIF
call a function within the thread making the system call, but this could be supplied by the preloader. A custom preloader would be overkill for blocking execve()
just for statically linked applications but if it existed, Firejail would be able to install any seccomp filters, even for example SECCOMP_SET_MODE_STRICT
which only allows read
, write
, _exit
and sigreturn
.
Hi there,
From what I understand on how firejail is working:
Long story short, to
seccomp
system call such asexecve
, firejail is injecting code throughLD_PRELOAD
mechanism. As a result, for static binaries, this is ignored, and the resulting process will be able toexecve
.It would be nice to have a warning that the
seccomp
will not be honored, or even an opt-in option to avoid these behavior (ie. exit instead of launch the binary), for use cases where Firejail is use to sandboxed untrusted binaries. I don't know if another mechanism (like usingptrace
) could be used to actually circumvent this behavior.To reproduce: