iovisor / bcc

BCC - Tools for BPF-based Linux IO analysis, networking, monitoring, and more
Apache License 2.0
20.55k stars 3.88k forks source link

"invalid BPF_LD_IMM insn" when accessing array using data from a map #601

Open janrueth opened 8 years ago

janrueth commented 8 years ago

Hi,

I stumbled over the following issue, the compiler seems to create an invalid opcode for the following:

from bcc import BPF

# load BPF program
b = BPF(text = """

struct Key {
    unsigned char p[64];
};

struct Leaf {
    uint16_t len;
};

BPF_TABLE("hash", struct Key, struct Leaf, store, 128);

    void test(struct __sk_buff* skb) {
    struct Key key;
    __builtin_memset(&key, 0, sizeof(key));

    struct Leaf * lookup_leaf = store.lookup(&key);
    if(lookup_leaf) {
            uint8_t lookup_list[32] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,7,8,19,20,21,22,23,24,25,26,27,28,29,30,31};
            if(lookup_leaf->len < 32) {
                uint16_t elem_len = lookup_leaf->len;

                uint8_t look = lookup_list[elem_len];

                bpf_trace_printk("array value is %d\\n", look);
            }
        }
    }

    """)
f = open("test.o", "w+")
f.write(b.dump_func("test"))
f.close()

b.load_func("test", BPF.SCHED_ACT)

I want to use lookup_leaf->len to index the static array, however if I try to load it I get a:

bpf: Invalid argument
0: (b7) r1 = 0
1: (7b) *(u64 *)(r10 -8) = r1
2: (7b) *(u64 *)(r10 -16) = r1
3: (7b) *(u64 *)(r10 -24) = r1
4: (7b) *(u64 *)(r10 -32) = r1
5: (7b) *(u64 *)(r10 -40) = r1
6: (7b) *(u64 *)(r10 -48) = r1
7: (7b) *(u64 *)(r10 -56) = r1
8: (7b) *(u64 *)(r10 -64) = r1
9: (18) r1 = 0xb61b3080
11: (bf) r2 = r10
12: (07) r2 += -64
13: (85) call 1
14: (15) if r0 == 0x0 goto pc+17
 R0=map_value(ks=64,vs=2) R10=fp
15: (69) r1 = *(u16 *)(r0 +0)
16: (25) if r1 > 0x1f goto pc+15
 R0=map_value(ks=64,vs=2) R1=inv R10=fp
17: (b7) r2 = 0
18: (73) *(u8 *)(r10 -70) = r2
19: (b7) r2 = 2660
20: (6b) *(u16 *)(r10 -72) = r2
21: (18) r2 = 0x636e7973
23: (7b) *(u64 *)(r10 -80) = r2
24: (00) r0 = 0x0
invalid BPF_LD_IMM insn

When inspecting the produced ebpf code I see that there is indeed an opcode 00 at that address. Here ins 23 and 24 taken from dump_func():

     7B2AB0FF 00000000 *00*70B24F 00000000

Why is this happening? Is it impossible to index an array from a map item?

Thanks!

4ast commented 8 years ago
uint8_t lookup_list[32] = {0,1,...};
if(lookup_leaf->len < 32) {
       uint16_t elem_len = lookup_leaf->len;
       uint8_t look = lookup_list[elem_len];

there are multiple issues here: . llvm puts lookup_list into data section which is not supported, so ld_imm is trying to load an address of global symbol, so verifier complains of 'invalid ld_imm'. the only supported global symbol right now is address of the bpf map which is encoded inside ld_imm as integer fd. . elem_len is not recognized by verifier as being within bounds

quite a bit of work need to happen on llvm and kernel side to add such support. would be good if you can describe the use case, so may be there other ways to solve it in the mean time.

janrueth commented 8 years ago

The use case is that I was going to replace parts of the incoming packet with data that comes through a bpf map from the user. I ran into the problem that bpf_skb_store_bytes would not accept a) data from a map that's why I needed to copy it to the stack (see Issue #606) and b) would not accept the length field coming from the map.

I tried copying it usingvolatile uint16_t elem_len = lookup_leaf->len; (without the volatile the compiler simply does not even create a variable on the stack) and using it as the 4th argument to bpf_skb_store_bytes but the verifier complained: R4 type=inv expected=imm, which I think is caused by the compiler doing this:

956: (69) r1 = *(u16 *)(r10 -130)
957: (15) if r1 == 0x0 goto pc+9
 R0=inv R1=inv R2=inv R3=inv R4=inv R5=inv R6=ctx R7=inv R8=inv R9=map_value(ks=64,vs=34) R10=fp fp-144=map_value
958: (69) r1 = *(u16 *)(r10 -130)
959: (25) if r1 > 0x1f goto pc+7
 R0=inv R1=inv R2=inv R3=inv R4=inv R5=inv R6=ctx R7=inv R8=inv R9=map_value(ks=64,vs=34) R10=fp fp-144=map_value
960: (69) r4 = *(u16 *)(r10 -130)
961: (bf) r3 = r10
962: (07) r3 += -128
963: (bf) r1 = r6
964: (bf) r2 = r8
965: (b7) r5 = 0
966: (85) call 9
R4 type=inv expected=imm

As you can see it uses r1 to jump beyond bpf_skb_store_bytes twice and then uses r4 with the same address relative to the fp to call bpf_skb_store_bytes it seems that the verifier does not recognize that r10-130 is valid then or did I misunderstand the meaning of inv (i.e. what makes a pointer valid if not checking its underlying value?)?

After that did not work, I though okay, I can just use a lookup list that maps 1 -> 1, 2->2 etc.. (In my actual code I do have bounds checks btw) which is ugly but better than my current solution to have a huge switch statement one for each possible length.

Why is it mapping lookup_list to another section and not on the stack? I mean I would agree if I had simply declared a pointer to {1,2, ..., n} but I thought declaring it as an array would just put it on the stack. I also tried it without the static assignment and added things like this:

uint8_t lookup_list[32];
uint8_t i = 0;
lookup_list[i] = i++;
lookup_list[i] = i++;
lookup_list[i] = i++;
lookup_list[i] = i++;
...

Again, I can work around this, but it is ugly and bloats my program, and I actually want to avoid expansion macros.

4ast commented 8 years ago

a) data from a map that's why I needed to copy it to the stack (see Issue #606) and b) would not accept the length field coming from the map.

yes that's the known limitation. we're planning to address (a) The (b) with variable length is harder, but should be doable too. There is no code yet, so kernel patches are welcome ;)

janrueth commented 8 years ago

Okay a) should be fairly easy I guess but I see different ways to implement it, can you point me to some location where you are discussion these kinds of changes / roadmap or does it all happen in the netdev mailing list? For b) I guess changes need to happen in the clang fontend which I don't want to touch :)

borkmann commented 8 years ago

The part where you'd need to look into is the BPF verifier (kernel/bpf/verifier.c) in the kernel. We'd need to extend it that helpers like bpf_skb_store_bytes() accept input from map buffers as well. The verifier takes bpf_skb_store_bytes_proto (see net/core/filter.c in kernel tree) as input and matches against .argX_type declarations, which currently is limted to ARG_PTR_TO_STACK (arg3), ARG_CONST_STACK_SIZE (arg4). So, this would need a new type or some form of enumeration of possible input types for the helper, and verifier needs to infer the map and some form of value size information that guarantees that it's not accessed out of bounds.

janrueth commented 8 years ago

Yep. I also found that. I think always defining new types for combinations of argument types is not a good idea, I think it would be wiser if one could |(or)-them. But than the enum style would need to be changed to some kind of a bitfield.

borkmann commented 8 years ago

Binary Or'ing them could be one option, however, I'm a bit worried when we start to use this more extensively in future, and say you have 3 or more or'ed combinations (arg3 in this example) where the related size types (arg4 here) could have overlapping semantic meanings, then there wouldn't be strict ordering. Hmm, maybe better to make a list of related pairs so we have strict ordering on what type pairs to expect.

janrueth commented 8 years ago

Hm that sounds like a better idea. Until now, I haven't had a look at the verifier in depth what it actually checks. But after doing so your solution sounds better :). So instead of having one argument, one could have an array of at most (lets say) 6? arguments per register and then simply iterates in the verifier over all pairs of register types?

Like this:

static const struct bpf_func_proto bpf_skb_store_bytes_proto = {
    .func       = bpf_skb_store_bytes,
    .gpl_only   = false,
    .ret_type   = RET_INTEGER,
    .arg1_type  = {ARG_PTR_TO_CTX,          ARG_PTR_TO_CTX,     __ARG_EOL},
    .arg2_type  = {ARG_ANYTHING,            ARG_ANYTHING,       __ARG_EOL},
    .arg3_type  = {ARG_PTR_TO_STACK,        ARG_CONST_MAP_PTR,  __ARG_EOL},
    .arg4_type  = {ARG_CONST_STACK_SIZE,    ARG_PTR_TO_MAP_KEY, __ARG_EOL},
    .arg5_type  = {ARG_ANYTHING,            ARG_ANYTHING,       __ARG_EOL},
};

One would just need to loop in verifier.c (check_call()) and make sure to collect errors and reset the state after each loop. And also a minor change in check_raw_mode() would be required.

If this is a desired solution I would be happy to implement it.

4ast commented 8 years ago

I'd say before changing all bpf_func_proto to the new format let's add new type ARG_PTR_TO_STACK_OR_MAP_VALUE, since I suspect the changes needed in other parts of verifier are more substantial and trickier to do then this naming dilemma. Once it's all done, we can go back and clean up this name bit.