Closed kolyshkin closed 2 years ago
Oh, that's fun :/
Although like you said the practical impact of this should be very close to nil. Still, this could be potentially bad so let's go ahead and add it to the v2.5.4 milestone. Any objections?
Although like you said the practical impact of this should be very close to nil. Still, this could be potentially bad so let's go ahead and add it to the v2.5.4 milestone. Any objections?
Agreed.
A bit more info which may be useful in chasing this down ...
I modified a copy of "01-sim-allow.c" to this:
#include <errno.h>
#include <unistd.h>
#include <seccomp.h>
#include "util.h"
int main(int argc, char *argv[])
{
int rc;
struct util_options opts;
scmp_filter_ctx ctx = NULL;
rc = util_getopt(argc, argv, &opts);
if (rc < 0)
goto out;
ctx = seccomp_init(SCMP_ACT_ALLOW);
if (ctx == NULL)
return ENOMEM;
#if 1
rc = seccomp_attr_set(ctx, SCMP_FLTATR_CTL_OPTIMIZE, 2);
if (rc < 0)
goto out;
#endif
rc = util_filter_output(&opts, ctx);
if (rc)
goto out;
out:
seccomp_release(ctx);
return (rc < 0 ? -rc : rc);
}
When run without the binary tree optimization I get this:
% ./00-test -b | ../tools/scmp_bpf_disasm
line OP JT JF K
=================================
0000: 0x20 0x00 0x00 0x00000004 ld $data[4]
0001: 0x15 0x00 0x04 0xc000003e jeq 3221225534 true:0002 false:0006
0002: 0x20 0x00 0x00 0x00000000 ld $data[0]
0003: 0x35 0x00 0x01 0x40000000 jge 1073741824 true:0004 false:0005
0004: 0x15 0x00 0x01 0xffffffff jeq 4294967295 true:0005 false:0006
0005: 0x06 0x00 0x00 0x7fff0000 ret ALLOW
0006: 0x06 0x00 0x00 0x00000000 ret KILL
When run with the binary tree optimization I get this:
% ./00-test -b | ../tools/scmp_bpf_disasm
line OP JT JF K
=================================
0000: 0x20 0x00 0x00 0x00000004 ld $data[4]
0001: 0x15 0x00 0x04 0xc000003e jeq 3221225534 true:0002 false:0006
0002: 0x20 0x00 0x00 0x00000000 ld $data[0]
0003: 0x35 0x00 0x01 0x40000000 jge 1073741824 true:0004 false:0005
0004: 0x15 0x00 0x01 0xffffffff jeq 4294967295 true:0005 false:0006
0005: 0x20 0x00 0x00 0x00000000 ld $data[0]
0006: 0x06 0x00 0x00 0x00000000 ret KILL
The x86_64/x32 check is correct on both, but in the binary tree case the syscall number is reloaded (line 0005) and the only return option is "KILL".
Well, as a quick-hack this "works":
diff --git a/src/gen_bpf.c b/src/gen_bpf.c
index c878f443..54c28c5e 100644
--- a/src/gen_bpf.c
+++ b/src/gen_bpf.c
@@ -1692,6 +1692,7 @@ static struct bpf_blk *_gen_bpf_arch(struct bpf_state *state,
goto arch_failure;
blk_cnt += blks_added;
+#if 0
if (bintree_levels > 0) {
_BPF_INSTR(instr, _BPF_OP(state->arch, BPF_LD + BPF_ABS),
_BPF_JMP_NO, _BPF_JMP_NO,
@@ -1705,6 +1706,7 @@ static struct bpf_blk *_gen_bpf_arch(struct bpf_state *state,
b_bintree->acc_start = _ACC_STATE_UNDEF;
b_bintree->acc_end = _ACC_STATE_OFFSET(_BPF_OFFSET_SYSCALL);
}
+#endif
/* additional ABI filtering */
if ((state->arch->token == SCMP_ARCH_X86_64 ||
... no idea yet if it still works for the other cases.
Random observation, it looks like our binary trees may not always be properly balanced.
Ran out of time today, if no one else has time to look at it I'll try to take another go at it later this week (or next).
Random observation, it looks like our binary trees may not always be properly balanced.
Out of simplicity I designed it to fill up the left node before filling the right node. (Each node is 4 syscalls.) My rationale was that this optimization really only makes sense on really, really large filters. An imbalance of (up to) 4 syscalls would be small compared to walking the entire filter of 200+ syscalls.
Ran out of time today, if no one else has time to look at it I'll try to take another go at it later this week (or next).
No pressure either way. I technically feel like I own it - since it was my crazy idea :). But having another person get somewhat familiar with the code wouldn't be a bad thing either.
I should have time this week to check it out.
Haven't had a chance to test it, but I believe this is the fix:
$ git diff
diff --git a/src/gen_bpf.c b/src/gen_bpf.c
index c878f443a792..71317612103a 100644
--- a/src/gen_bpf.c
+++ b/src/gen_bpf.c
@@ -1348,6 +1348,9 @@ static int _get_bintree_levels(unsigned int syscall_cnt)
{
unsigned int i = 2, max_level = SYSCALLS_PER_NODE * 2;
+ if (syscall_cnt == 0)
+ return 0;
+
while (max_level < syscall_cnt) {
max_level <<= 1;
i++;
$ git diff diff --git a/src/gen_bpf.c b/src/gen_bpf.c index c878f443a792..71317612103a 100644 --- a/src/gen_bpf.c +++ b/src/gen_bpf.c @@ -1348,6 +1348,9 @@ static int _get_bintree_levels(unsigned int syscall_cnt) { unsigned int i = 2, max_level = SYSCALLS_PER_NODE * 2; + if (syscall_cnt == 0) + return 0; + while (max_level < syscall_cnt) { max_level <<= 1; i++;
This made both of the above reproducers work properly for me.
$ git diff diff --git a/src/gen_bpf.c b/src/gen_bpf.c index c878f443a792..71317612103a 100644 --- a/src/gen_bpf.c +++ b/src/gen_bpf.c @@ -1348,6 +1348,9 @@ static int _get_bintree_levels(unsigned int syscall_cnt) { unsigned int i = 2, max_level = SYSCALLS_PER_NODE * 2; + if (syscall_cnt == 0) + return 0; + while (max_level < syscall_cnt) { max_level <<= 1; i++;
This made both of the above reproducers work properly for me.
Sorry for the delay, but this looks good to me. Feel free to patch and merge. Thanks @drakenclimber.
Acked-by: Paul Moore <paul@paul-moore.com>
Surely this is a corner case, and enabling binary tree optimization is obviously useless then there are no rules, but it still feels like a bug.