google / nsjail

A lightweight process isolation tool that utilizes Linux namespaces, cgroups, rlimits and seccomp-bpf syscall filters, leveraging the Kafel BPF language for enhanced security.
https://nsjail.dev
Apache License 2.0
2.97k stars 274 forks source link

seccomp for ELF-32 by ELF-64 nsjail? #159

Open ukai opened 3 years ago

ukai commented 3 years ago

How can we configure seccomp-bpf for ELF32 executable launched by ELF64 nsjail?

$ file =nsjail
/usr/local/google/home/ukai/bin/nsjail: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=b88298f149e4473a5e9780b1f7bdef933feb636d, for GNU/Linux 3.2.0, not stripped
$ cat a.c
#include <string.h>
#include <unistd.h>

int main() {
     char msg[] = "hello, world\n";
     write(1, msg, strlen(msg));
}
$ gcc -m32 a.c
$ file a.out
a.out: ELF 32-bit LSB pie executable, Intel 80386, version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux.so.2, BuildID[sha1]=e18d9336a34a44095a0d61658e1aec3449a75192, for GNU/Linux 3.2.0, not stripped
$ nsjail --chroot / -D "$(pwd)" ./a.out
[I][2021-02-03T01:31:32+0000] Mode: STANDALONE_ONCE
[I][2021-02-03T01:31:32+0000] Jail parameters: hostname:'NSJAIL', chroot:'/', process:'./a.out', bind:[::]:0, max_conns_per_ip:0, time_limit:0, personality:0, daemonize:false, clone_newnet:true, clone_newuser:true, clone_newns:true, clone_newpid:true, clone_newipc:true, clone_newuts:true, clone_newcgroup:true, keep_caps:false, disable_no_new_privs:false, max_cpus:0
[I][2021-02-03T01:31:32+0000] Mount: '/' -> '/' flags:MS_RDONLY|MS_BIND|MS_REC|MS_PRIVATE type:'' options:'' dir:true
[I][2021-02-03T01:31:32+0000] Mount: '/proc' flags:MS_RDONLY type:'proc' options:'' dir:true
[I][2021-02-03T01:31:32+0000] Uid map: inside_uid:22776 outside_uid:22776 count:1 newuidmap:false
[I][2021-02-03T01:31:32+0000] Gid map: inside_gid:89939 outside_gid:89939 count:1 newgidmap:false
[I][2021-02-03T01:31:32+0000] Executing './a.out' for '[STANDALONE MODE]'
hello, world
[I][2021-02-03T01:31:32+0000] pid=1003284 ([STANDALONE MODE]) exited with status: 0, (PIDs left: 0)
$ nsjail --chroot / -D "$(pwd)" --seccomp_string 'ALLOW { execve, brk } DEFAULT KILL' ./a.out
[I][2021-02-03T01:31:45+0000] Mode: STANDALONE_ONCE
[I][2021-02-03T01:31:45+0000] Jail parameters: hostname:'NSJAIL', chroot:'/', process:'./a.out', bind:[::]:0, max_conns_per_ip:0, time_limit:0, personality:0, daemonize:false, clone_newnet:true, clone_newuser:true, clone_newns:true, clone_newpid:true, clone_newipc:true, clone_newuts:true, clone_newcgroup:true, keep_caps:false, disable_no_new_privs:false, max_cpus:0
[I][2021-02-03T01:31:45+0000] Mount: '/' -> '/' flags:MS_RDONLY|MS_BIND|MS_REC|MS_PRIVATE type:'' options:'' dir:true
[I][2021-02-03T01:31:45+0000] Mount: '/proc' flags:MS_RDONLY type:'proc' options:'' dir:true
[I][2021-02-03T01:31:45+0000] Uid map: inside_uid:22776 outside_uid:22776 count:1 newuidmap:false
[I][2021-02-03T01:31:45+0000] Gid map: inside_gid:89939 outside_gid:89939 count:1 newgidmap:false
[I][2021-02-03T01:31:45+0000] Executing './a.out' for '[STANDALONE MODE]'
[W][2021-02-03T01:31:45+0000][1003298] void subproc::seccompViolation(nsjconf_t*, siginfo_t*)():259 pid=1003299 commited a syscall/seccomp violation and exited with SIGSYS
[W][2021-02-03T01:31:45+0000][1003298] void subproc::seccompViolation(nsjconf_t*, siginfo_t*)():289 pid=1003299, SiSyscall: 31, SiCode: 2, SiErrno: 0, SiSigno: 17, SP: 0, PC: 0
[I][2021-02-03T01:31:45+0000] pid=1003299 ([STANDALONE MODE]) terminated with signal: SIGSYS (31), (PIDs left: 0)
$ sudo journalctl -g 'type[=]SECCOMP' | tail -1
Feb 03 01:31:45 umiu.c.googlers.com audispd[568]: node=umiu.c.googlers.com type=SECCOMP msg=audit(1612315905.346:29352): auid=22776 uid=22776 gid=89939 ses=11 subj==untrusted (complain) pid=1003299 comm="a.out" exe="/tmp/g/a.out" sig=31 arch=40000003 syscall=45 compat=1 ip=0xf7f160d7 code=0x0

execve seems to be needed for nsjail's execve itself. I think syscall=45 is brk in i386 (not recvfrom in arm64).

happyCoder92 commented 3 years ago

It's not currently supported. The problem here is nsjail will issue a x86_64 execve syscall, so you would need a seccomp policy that supports both archs.

The easiest workaround is to compile nsjail as 32-bit binary.

ukai commented 3 years ago

using 32-bit nsjail might not be a solution, as target programs may mix of ELF64/32? (e.g. run ELF-64 bash and it launches ELF-32 executable?)

any plan to add support?

tomjaguarpaw commented 3 years ago

The easiest workaround is to compile nsjail as 32-bit binary.

Is that actually supported? When I try to compile on a 32-bit system I run into https://github.com/google/nsjail/issues/149. The comment https://github.com/google/nsjail/issues/149#issuecomment-669141295 suggests that 32-bit is not supported. Can anyone clarify?