Open ManaSugi opened 2 years ago
@pcmoore @drakenclimber I'd appreciate it if you could answer at your convenience.
Would you mind if I asked the use case of
SCMP_FLTATR_API_TSKIP
?
Well, the use case is exactly as you described in your posting above; it is intended to support process tracers :)
It has been several years since we made this change, so this reasoning may be wrong, but my recollection is that without a "syscall == -1" allow filter rule, the seccomp filter would reject the syscall skip before the kernel got to the skip line you mentioned. The "syscall == -1" rule in the BPF filter isn't to force the syscall to be skipped, it is to allow the kernel processing to get to the point where the syscall can be skipped.
Of course if you have a reproducer which shows that this doesn't work this way anymore I think we would like to see it :)
@pcmoore Thank you for your comment.
without a "syscall == -1" allow filter rule, the seccomp filter would reject the syscall skip before the kernel got to the skip line you mentioned. The "syscall == -1" rule in the BPF filter isn't to force the syscall to be skipped, it is to allow the kernel processing to get to the point where the syscall can be skipped. Of course if you have a reproducer which shows that this doesn't work this way anymore I think we would like to see it :)
I attached the reproducer which shows that a tracer program can skip a system call without a "syscall == -1" rule.
The ptrace_test.c
is a simple reproducer that skips a getuid
syscall using SECCOMP_RET_TRACE
by changing the register.
I can observe that the kernel can get to the skip line as I mentioned earlier by setting probe point to https://elixir.bootlin.com/linux/v5.10/source/kernel/seccomp.c#L989 .
$ uname -a
Linux xxxx 5.10.0-1057-oem #61-Ubuntu SMP Thu Jan 13 15:06:11 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
$ sudo perf probe --source=/usr/src/linux-oem-5.10-5.10.0 --add "__seccomp_filter:68 this_syscall"
Added new event:
probe:__seccomp_filter_L68 (on __seccomp_filter:68 with this_syscall)
You can now use it in all perf tools, such as:
perf record -e probe:__seccomp_filter_L68 -aR sleep 1
$ sudo perf record -e probe:__seccomp_filter_L68 -aR ./ptrace_test
caught getuid syscall
uid: -38
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 1.898 MB perf.data (1 samples) ]
$ sudo perf script
ptrace_test 12337 [020] 2739.243594: probe:__seccomp_filter_L68: (ffffffffb639720e) this_syscall=-1
The tracer program outputs -38
which is -ENOSYS
(syscall-enter-stop on x86) as the return value of getuid
, and we can see that this_syscall
is set to -1
.
If you don't mind, could you look into the reproducer? Thank you.
Thanks for sending the reproducer and the additional information, we'll add this to the list of things to investigate further but it might take me some time to get back to this.
As a reminder, SCMP_FLTATR_API_TSKIP
is disabled by default.
Interesting. I'm swamped at the moment as well, but I am definitely intrigued.
Thank you for considering review it. It would be helpful.
As a reminder, SCMP_FLTATR_API_TSKIP is disabled by default.
Yes, so I didn't enable the SCMP_FLTATR_API_TSKIP
attribute in the reproducer to make sure that the kernel can skip the system call without the attribute.
Hello, I have a question about
SCMP_FLTATR_API_TSKIP
attribute.SCMP_FLTATR_API_SKIP
has been supported from https://github.com/seccomp/libseccomp/commit/dc879990774b5fe0b5d3362ae592e8a5bb615fbb in order to address the #80 and the man page explains as follows:However, I think tracer programs do not use
SCMP_FLTATR_API_TSKIP
to skip a syscall because the tracer skips a syscall by changing directly the register of syscall number as explained inseccomp(2)
, not using a seccomp filter._Excerpt from
SECCOMP_RET_TRACE
section inseccomp(2)
:_Actually, the kernel will skip a syscall if the syscall number is set to -1 by a ptracer at the following point. https://elixir.bootlin.com/linux/v5.16/source/kernel/seccomp.c#L1229 The ptracer can set the syscall value of -1 without
SCMP_FLTATR_API_TSKIP
because it just changes the register.Hence, it does not seem to make sense to create a filter rule using a syscall value of -1. I'm sorry if I'm wrong, but I'm not sure why
SCMP_FLTATR API_TSKIP
was added. Would you mind if I asked the use case ofSCMP_FLTATR_API_TSKIP
?