google / nsjail

A lightweight process isolation tool that utilizes Linux namespaces, cgroups, rlimits and seccomp-bpf syscall filters, leveraging the Kafel BPF language for enhanced security.
https://nsjail.dev
Apache License 2.0
2.98k stars 274 forks source link

SIGTERM Default Handler Issue #226

Closed tr-intel closed 1 year ago

tr-intel commented 1 year ago

SIGTERM default action should be process termination.

However, when running under nsjail, SIGTERM’s default handler SIG_DFL(0) is not doing anything! I have a simple POC indicating the signal handler is not SIG_IGN(1) and the signal is not blocked.

Attacked are two files: signal.c test.sh signal_issue.zip

Following is the test.sh POC output:

Normal run
========================================
SIGTERM handler address: (nil)
Raise SIGTERM...
Terminated

nsjail run
========================================
[I][2023-11-02T21:37:33+0200] Mode: STANDALONE_ONCE
[I][2023-11-02T21:37:33+0200] Jail parameters: hostname:'NSJAIL', chroot:'', process:'/home/.../signal', bind:[::]:0, max_conns:0, max_conns_per_ip:0, time_limit:0, personality:0, daemonize:false, clone_newnet:true, clone_newuser:true, clone_newns:true, clone_newpid:true, clone_newipc:true, clone_newuts:true, clone_newcgroup:true, clone_newtime:false, keep_caps:false, disable_no_new_privs:false, max_cpus:0
[I][2023-11-02T21:37:33+0200] Mount: '/' flags:MS_RDONLY type:'tmpfs' options:'' dir:true
[I][2023-11-02T21:37:33+0200] Mount: '/lib' -> '/lib' flags:MS_RDONLY|MS_BIND|MS_REC|MS_PRIVATE type:'' options:'' dir:true
[I][2023-11-02T21:37:33+0200] Mount: '/lib64/' -> '/lib64/' flags:MS_RDONLY|MS_BIND|MS_REC|MS_PRIVATE type:'' options:'' dir:true
[I][2023-11-02T21:37:33+0200] Mount: '/home/.../issue' -> '/home/.../issue' flags:MS_RDONLY|MS_BIND|MS_REC|MS_PRIVATE type:'' options:'' dir:true
[I][2023-11-02T21:37:33+0200] Mount: '/proc' flags:MS_RDONLY type:'proc' options:'' dir:true
[I][2023-11-02T21:37:33+0200] Uid map: inside_uid:1000 outside_uid:1000 count:1 newuidmap:false
[I][2023-11-02T21:37:33+0200] Gid map: inside_gid:1000 outside_gid:1000 count:1 newgidmap:false
[I][2023-11-02T21:37:33+0200] Executing '/home/.../signal' for '[STANDALONE MODE]'
SIGTERM handler address: (nil)
Raise SIGTERM...
Still alive 🫤
SIGTERM is not blocked
SIGTERM is not pending
SIGTERM handler address: 0x562e925452c9
Raise SIGTERM...
======= New handler got 15 signal! ======
Restored SIGTERM to SIG_DFL
SIGTERM handler address: (nil)
Raise SIGTERM...
Still alive 🫤
killall signal
 112684 ?        00:00:00 signal
killall --signal SIGKILL signal
[I][2023-11-02T21:37:35+0200] pid=112684 ([STANDALONE MODE]) terminated with signal: SIGKILL (9), (PIDs left: 0)

AFAICT, the same issue happens with all other default-terminate-action signals (e.g., SIGUSR1 )

tr-intel commented 1 year ago

OK, I've found an explanation for this behaviour.

The only signals that can be sent to process ID 1, the init process, are those for which init has explicitly installed signal handlers. This is done to assure the system is not brought down accidentally.

The default configuration setting for nsjail is to have clone_newpid set to true. This means that when a new PID (Process ID) namespace is created, the sandboxed process becomes the sole process within it and is assigned PID 1. This configuration may lead the Linux kernel to incorrectly perceive our sandboxed process as the init process.

Simply setting clone_newpid to false, solves this issue.