seccomp / libseccomp

The main libseccomp repository
GNU Lesser General Public License v2.1
805 stars 171 forks source link

BUG: enabled binary tree optimization with no rules results in default action being ignored #370

Closed kolyshkin closed 2 years ago

kolyshkin commented 2 years ago

Surely this is a corner case, and enabling binary tree optimization is obviously useless then there are no rules, but it still feels like a bug.

/*

This code demonstrates the following issue with libseccomp <= 2.5.3.

When the binary tree optimization is enabled (i.e. SCMP_FLTATR_CTL_OPTIMIZE
attribute value is set to 2), and no rules are added to the filter, the default
action seems to be ignored:

    $ gcc -lseccomp opt.c && ./a.out
    optimize 2
    load
    Bad system call (core dumped)

Disabling the binary tree optimization and/or adding a single rule
eliminates the issue:

    $ gcc -lseccomp -DOPT_LEVEL=1 opt.c && ./a.out
    optimize 1
    load
    release

    $ gcc -lseccomp -DADD_RULE opt.c && ./a.out
    optimize 2
    rule_add
    load
    release

*/

#include <stdio.h>
#include <errno.h>
#include <string.h>
#include <seccomp.h>

#ifndef OPT_LEVEL
#define OPT_LEVEL 2
#endif

int main(int argc, char *argv[])
{
    int rc = -1;
    scmp_filter_ctx ctx;

    ctx = seccomp_init(SCMP_ACT_ALLOW);
    if (ctx == NULL) {
        fprintf(stderr, "seccomp_init failed\n");
        goto out;
    }

    printf("optimize %d\n", OPT_LEVEL);
    rc = seccomp_attr_set(ctx, SCMP_FLTATR_CTL_OPTIMIZE, OPT_LEVEL);
    if (rc < 0) {
        fprintf(stderr, "seccomp_attr_set: %s\n", strerror(-rc));
        goto out;
    }

#ifdef ADD_RULE
    printf("rule_add\n");
    rc = seccomp_rule_add(ctx, SCMP_ACT_ERRNO(ENOSYS), SCMP_SYS(sync), 0);
    if (rc < 0) {
        fprintf(stderr, "seccomp_rule_add: %s\n", strerror(-rc));
        goto out;
    }
#endif

    printf("load\n");
    rc = seccomp_load(ctx);
    if (rc < 0) {
        fprintf(stderr, "seccomp_load: %s\n", strerror(-rc));
        goto out;
    }

out:
    printf("release\n");
    seccomp_release(ctx);
    return -rc;
}
pcmoore commented 2 years ago

Oh, that's fun :/

Although like you said the practical impact of this should be very close to nil. Still, this could be potentially bad so let's go ahead and add it to the v2.5.4 milestone. Any objections?

drakenclimber commented 2 years ago

Although like you said the practical impact of this should be very close to nil. Still, this could be potentially bad so let's go ahead and add it to the v2.5.4 milestone. Any objections?

Agreed.

pcmoore commented 2 years ago

A bit more info which may be useful in chasing this down ...

I modified a copy of "01-sim-allow.c" to this:

#include <errno.h>
#include <unistd.h>

#include <seccomp.h>

#include "util.h"

int main(int argc, char *argv[])
{
    int rc;
    struct util_options opts;
    scmp_filter_ctx ctx = NULL;

    rc = util_getopt(argc, argv, &opts);
    if (rc < 0)
        goto out;

    ctx = seccomp_init(SCMP_ACT_ALLOW);
    if (ctx == NULL)
        return ENOMEM;

#if 1
    rc = seccomp_attr_set(ctx, SCMP_FLTATR_CTL_OPTIMIZE, 2);
    if (rc < 0)
        goto out;
#endif

    rc = util_filter_output(&opts, ctx);
    if (rc)
        goto out;

out:
    seccomp_release(ctx);
    return (rc < 0 ? -rc : rc);
}

When run without the binary tree optimization I get this:

% ./00-test -b | ../tools/scmp_bpf_disasm 
 line  OP   JT   JF   K
=================================
 0000: 0x20 0x00 0x00 0x00000004   ld  $data[4]
 0001: 0x15 0x00 0x04 0xc000003e   jeq 3221225534 true:0002 false:0006
 0002: 0x20 0x00 0x00 0x00000000   ld  $data[0]
 0003: 0x35 0x00 0x01 0x40000000   jge 1073741824 true:0004 false:0005
 0004: 0x15 0x00 0x01 0xffffffff   jeq 4294967295 true:0005 false:0006
 0005: 0x06 0x00 0x00 0x7fff0000   ret ALLOW
 0006: 0x06 0x00 0x00 0x00000000   ret KILL

When run with the binary tree optimization I get this:

% ./00-test -b | ../tools/scmp_bpf_disasm 
 line  OP   JT   JF   K
=================================
 0000: 0x20 0x00 0x00 0x00000004   ld  $data[4]
 0001: 0x15 0x00 0x04 0xc000003e   jeq 3221225534 true:0002 false:0006
 0002: 0x20 0x00 0x00 0x00000000   ld  $data[0]
 0003: 0x35 0x00 0x01 0x40000000   jge 1073741824 true:0004 false:0005
 0004: 0x15 0x00 0x01 0xffffffff   jeq 4294967295 true:0005 false:0006
 0005: 0x20 0x00 0x00 0x00000000   ld  $data[0]
 0006: 0x06 0x00 0x00 0x00000000   ret KILL

The x86_64/x32 check is correct on both, but in the binary tree case the syscall number is reloaded (line 0005) and the only return option is "KILL".

pcmoore commented 2 years ago

Well, as a quick-hack this "works":

diff --git a/src/gen_bpf.c b/src/gen_bpf.c
index c878f443..54c28c5e 100644
--- a/src/gen_bpf.c
+++ b/src/gen_bpf.c
@@ -1692,6 +1692,7 @@ static struct bpf_blk *_gen_bpf_arch(struct bpf_state *state,
                goto arch_failure;
        blk_cnt += blks_added;

+#if 0
        if (bintree_levels > 0) {
                _BPF_INSTR(instr, _BPF_OP(state->arch, BPF_LD + BPF_ABS),
                           _BPF_JMP_NO, _BPF_JMP_NO,
@@ -1705,6 +1706,7 @@ static struct bpf_blk *_gen_bpf_arch(struct bpf_state *state,
                b_bintree->acc_start = _ACC_STATE_UNDEF;
                b_bintree->acc_end = _ACC_STATE_OFFSET(_BPF_OFFSET_SYSCALL);
        }
+#endif

        /* additional ABI filtering */
        if ((state->arch->token == SCMP_ARCH_X86_64 ||

... no idea yet if it still works for the other cases.

pcmoore commented 2 years ago

Random observation, it looks like our binary trees may not always be properly balanced.

pcmoore commented 2 years ago

Ran out of time today, if no one else has time to look at it I'll try to take another go at it later this week (or next).

drakenclimber commented 2 years ago

Random observation, it looks like our binary trees may not always be properly balanced.

Out of simplicity I designed it to fill up the left node before filling the right node. (Each node is 4 syscalls.) My rationale was that this optimization really only makes sense on really, really large filters. An imbalance of (up to) 4 syscalls would be small compared to walking the entire filter of 200+ syscalls.

drakenclimber commented 2 years ago

Ran out of time today, if no one else has time to look at it I'll try to take another go at it later this week (or next).

No pressure either way. I technically feel like I own it - since it was my crazy idea :). But having another person get somewhat familiar with the code wouldn't be a bad thing either.

I should have time this week to check it out.

drakenclimber commented 2 years ago

Haven't had a chance to test it, but I believe this is the fix:

$ git diff
diff --git a/src/gen_bpf.c b/src/gen_bpf.c
index c878f443a792..71317612103a 100644
--- a/src/gen_bpf.c
+++ b/src/gen_bpf.c
@@ -1348,6 +1348,9 @@ static int _get_bintree_levels(unsigned int syscall_cnt)
 {
        unsigned int i = 2, max_level = SYSCALLS_PER_NODE * 2;

+       if (syscall_cnt == 0)
+               return 0;
+
        while (max_level < syscall_cnt) {
                max_level <<= 1;
                i++;
drakenclimber commented 2 years ago
$ git diff
diff --git a/src/gen_bpf.c b/src/gen_bpf.c
index c878f443a792..71317612103a 100644
--- a/src/gen_bpf.c
+++ b/src/gen_bpf.c
@@ -1348,6 +1348,9 @@ static int _get_bintree_levels(unsigned int syscall_cnt)
 {
        unsigned int i = 2, max_level = SYSCALLS_PER_NODE * 2;

+       if (syscall_cnt == 0)
+               return 0;
+
        while (max_level < syscall_cnt) {
                max_level <<= 1;
                i++;

This made both of the above reproducers work properly for me.

pcmoore commented 2 years ago
$ git diff
diff --git a/src/gen_bpf.c b/src/gen_bpf.c
index c878f443a792..71317612103a 100644
--- a/src/gen_bpf.c
+++ b/src/gen_bpf.c
@@ -1348,6 +1348,9 @@ static int _get_bintree_levels(unsigned int syscall_cnt)
 {
        unsigned int i = 2, max_level = SYSCALLS_PER_NODE * 2;

+       if (syscall_cnt == 0)
+               return 0;
+
        while (max_level < syscall_cnt) {
                max_level <<= 1;
                i++;

This made both of the above reproducers work properly for me.

Sorry for the delay, but this looks good to me. Feel free to patch and merge. Thanks @drakenclimber.

Acked-by: Paul Moore <paul@paul-moore.com>