seccomp / libseccomp

The main libseccomp repository
GNU Lesser General Public License v2.1
794 stars 171 forks source link

Q: SCMP_FLTATR_API_TSKIP does not seem to be used by tracer programs #368

Open ManaSugi opened 2 years ago

ManaSugi commented 2 years ago

Hello, I have a question about SCMP_FLTATR_API_TSKIP attribute. SCMP_FLTATR_API_SKIP has been supported from https://github.com/seccomp/libseccomp/commit/dc879990774b5fe0b5d3362ae592e8a5bb615fbb in order to address the #80 and the man page explains as follows:

A flag to specify if libseccomp should allow filter rules to be created for the -1 syscall. The -1 syscall value can be used by tracer programs to skip specific syscall invocations, see seccomp(2) for more information. Defaults to off ( value == 0).

However, I think tracer programs do not use SCMP_FLTATR_API_TSKIP to skip a syscall because the tracer skips a syscall by changing directly the register of syscall number as explained in seccomp(2), not using a seccomp filter.

_Excerpt from SECCOMP_RET_TRACE section in seccomp(2):_

The tracer can skip the system call by changing the system call number to -1. Alternatively, the tracer can change the system call requested by changing the system call to a valid system call number. If the tracer asks to skip the system call, then the system call will appear to return the value that the tracer puts in the return value register.

Actually, the kernel will skip a syscall if the syscall number is set to -1 by a ptracer at the following point. https://elixir.bootlin.com/linux/v5.16/source/kernel/seccomp.c#L1229 The ptracer can set the syscall value of -1 without SCMP_FLTATR_API_TSKIP because it just changes the register.

Hence, it does not seem to make sense to create a filter rule using a syscall value of -1. I'm sorry if I'm wrong, but I'm not sure why SCMP_FLTATR API_TSKIP was added. Would you mind if I asked the use case of SCMP_FLTATR_API_TSKIP?

ManaSugi commented 2 years ago

@pcmoore @drakenclimber I'd appreciate it if you could answer at your convenience.

pcmoore commented 2 years ago

Would you mind if I asked the use case of SCMP_FLTATR_API_TSKIP?

Well, the use case is exactly as you described in your posting above; it is intended to support process tracers :)

It has been several years since we made this change, so this reasoning may be wrong, but my recollection is that without a "syscall == -1" allow filter rule, the seccomp filter would reject the syscall skip before the kernel got to the skip line you mentioned. The "syscall == -1" rule in the BPF filter isn't to force the syscall to be skipped, it is to allow the kernel processing to get to the point where the syscall can be skipped.

Of course if you have a reproducer which shows that this doesn't work this way anymore I think we would like to see it :)

ManaSugi commented 2 years ago

@pcmoore Thank you for your comment.

without a "syscall == -1" allow filter rule, the seccomp filter would reject the syscall skip before the kernel got to the skip line you mentioned. The "syscall == -1" rule in the BPF filter isn't to force the syscall to be skipped, it is to allow the kernel processing to get to the point where the syscall can be skipped. Of course if you have a reproducer which shows that this doesn't work this way anymore I think we would like to see it :)

I attached the reproducer which shows that a tracer program can skip a system call without a "syscall == -1" rule. The ptrace_test.c is a simple reproducer that skips a getuid syscall using SECCOMP_RET_TRACE by changing the register.

ptrace_test.c
```c // Copyright 2022 Sony Group Corporation // #include #include #include #include #include #include #include #include #include #include int die (const char *msg) { perror(msg); exit(errno); } int child() { int rc = -1; scmp_filter_ctx ctx; prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0); ctx = seccomp_init(SCMP_ACT_ALLOW); if (ctx == NULL) goto out; rc = seccomp_rule_add_exact(ctx, SCMP_ACT_TRACE(getpid()), SCMP_SYS(getuid), 0); if (rc < 0) goto out; rc = seccomp_load(ctx); if (rc < 0) goto out; // This should output -ENOSYS (-38) as syscall-enter-stop on x86 printf("uid: %d\n", getuid()); out: seccomp_release(ctx); return -rc; } int main() { int pid; int rc; int status; struct user_regs_struct regs; pid = fork(); switch(pid) { case -1: die("failed to fork"); case 0: ptrace(PTRACE_TRACEME, 0, NULL, NULL); kill(getpid(), SIGSTOP); rc = child(); if (rc < 0) { die("failed to execute child"); } return 0; } waitpid(pid, &status, __WALL); ptrace(PTRACE_SETOPTIONS, pid, NULL, PTRACE_O_TRACESECCOMP); ptrace(PTRACE_CONT, pid, NULL, NULL); while(1) { waitpid(pid, &status, __WALL); if (status >> 8 == (SIGTRAP | (PTRACE_EVENT_SECCOMP << 8))) { ptrace(PTRACE_GETREGS, pid, NULL, ®s); if (regs.orig_rax == SYS_getuid) { printf("caught getuid syscall\n"); // Change the syscall number to -1 in order to skip the syscall regs.orig_rax = -1; ptrace(PTRACE_SETREGS, pid, NULL, ®s); } } if (WIFEXITED(status) || WIFSIGNALED(status)) { break; } ptrace(PTRACE_CONT, pid, NULL, NULL); } return 0; } ```

I can observe that the kernel can get to the skip line as I mentioned earlier by setting probe point to https://elixir.bootlin.com/linux/v5.10/source/kernel/seccomp.c#L989 .

$ uname -a 
Linux xxxx 5.10.0-1057-oem #61-Ubuntu SMP Thu Jan 13 15:06:11 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux 

$ sudo perf probe --source=/usr/src/linux-oem-5.10-5.10.0 --add "__seccomp_filter:68 this_syscall"                                                                                                                                                    
Added new event:
  probe:__seccomp_filter_L68 (on __seccomp_filter:68 with this_syscall)

You can now use it in all perf tools, such as:

        perf record -e probe:__seccomp_filter_L68 -aR sleep 1

$ sudo perf record -e probe:__seccomp_filter_L68 -aR ./ptrace_test
caught getuid syscall
uid: -38
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 1.898 MB perf.data (1 samples) ]

$ sudo perf script
     ptrace_test 12337 [020]  2739.243594: probe:__seccomp_filter_L68: (ffffffffb639720e) this_syscall=-1

The tracer program outputs -38 which is -ENOSYS (syscall-enter-stop on x86) as the return value of getuid, and we can see that this_syscall is set to -1.

If you don't mind, could you look into the reproducer? Thank you.

pcmoore commented 2 years ago

Thanks for sending the reproducer and the additional information, we'll add this to the list of things to investigate further but it might take me some time to get back to this.

As a reminder, SCMP_FLTATR_API_TSKIP is disabled by default.

drakenclimber commented 2 years ago

Interesting. I'm swamped at the moment as well, but I am definitely intrigued.

ManaSugi commented 2 years ago

Thank you for considering review it. It would be helpful.

As a reminder, SCMP_FLTATR_API_TSKIP is disabled by default.

Yes, so I didn't enable the SCMP_FLTATR_API_TSKIP attribute in the reproducer to make sure that the kernel can skip the system call without the attribute.