iovisor / bcc

BCC - Tools for BPF-based Linux IO analysis, networking, monitoring, and more
Apache License 2.0
20.61k stars 3.89k forks source link

LLVM optimized code loses ctx and confuses verifier #235

Open drzaeus77 opened 9 years ago

drzaeus77 commented 9 years ago

Source code is:

static int in_port(struct __sk_buff *skb) {
  int in = skb->ifindex;
  if (skb->cb[0])
    in = skb->cb[0];
  return in;
}
int foo(struct __sk_buff *skb) {
  int in = in_port(skb);
  return in;
}
0: (bf) r2 = r1
1: (07) r2 += 40
2: (61) r3 = *(u32 *)(r1 +48)
3: (b7) r4 = 0
4: (1d) if r3 == r4 goto pc+2
 R1=ctx R2=inv R3=inv R4=imm0 R10=fp
5: (07) r1 += 48
6: (bf) r2 = r1
7: (61) r0 = *(u32 *)(r2 +0)
R2 invalid mem access 'inv'

The issue is clearly that LLVM introduced an optimization that incremented r1 conditionally rather than letting it stay as original r1 value and loading from r1 +40/48. I will investigate which compiler pass moved this value, meanwhile @yonghong-song agreed to look at the verifier side.

yonghong-song commented 9 years ago

Just looked at verifier part.

after

5: (07) r1 += 48,

r1 will be marked as UNKNOWN_VALUE, so later on, insn #7, "r2 + 0" is UNKNOWN_VALUE ('inv')

and verifier will fail.

There is a source code workaround, and compiler could do it:

yhs@ubuntu:~/work/bcc/examples$ cat ex3.py

!/usr/bin/python

from bcc import BPF

load BPF program

b = BPF(text = """ static int in_port(struct __sk_buff skb) { int in = skb->ifindex; int cb = skb->cb[0]; return cb ? : in; } int foo(struct __sk_buff skb) { int in = in_port(skb); return in; } """, debug=2) fn = b.load_func("foo", BPF.SCHED_CLS) yhs@ubuntu:~/work/bcc/examples$ sudo ./ex3.py 0: (61) r0 = (u32 )(r1 +40) 1: (61) r1 = (u32 )(r1 +48) 2: (b7) r2 = 0 3: (1d) if r1 == r2 goto pc+1 R0=inv R1=inv R2=imm0 R10=fp 4: (bf) r0 = r1 5: (95) exit

from 3 to 5: safe

yhs@ubuntu:~/work/bcc/examples$

Fixing verifier will require some effort:

yhs@ubuntu:~/work/bcc/examples$ sudo ./ex2.py 0: (bf) r2 = r1 1: (07) r2 += 40 2: (61) r3 = (u32 )(r1 +48) 3: (b7) r4 = 0 4: (1d) if r3 == r4 goto pc+2 R1=ctx R2=inv R3=inv R4=imm0 R10=fp 5: (07) r1 += 48 6: (bf) r2 = r1 7: (61) r0 = (u32 )(r2 +0) R2 invalid mem access 'inv'

The value "r2" in insn #7 could be come from two places, insn #1 and

insn #6. Besides to remember "ctx + offset", some control flow/basic block

concept may be needed, and this may introduce a lot more complexity...

On Fri, Sep 18, 2015 at 3:49 PM, Brenden Blanco notifications@github.com wrote:

Source code is:

static int in_port(struct __sk_buff skb) { int in = skb->ifindex; if (skb->cb[0]) in = skb->cb[0]; return in; }int foo(struct __sk_buff skb) { int in = in_port(skb); return in; }

0: (bf) r2 = r1 1: (07) r2 += 40 2: (61) r3 = (u32 )(r1 +48) 3: (b7) r4 = 0 4: (1d) if r3 == r4 goto pc+2 R1=ctx R2=inv R3=inv R4=imm0 R10=fp 5: (07) r1 += 48 6: (bf) r2 = r1 7: (61) r0 = (u32 )(r2 +0) R2 invalid mem access 'inv'

The issue is clearly that LLVM introduced an optimization that incremented r1 conditionally rather than letting it stay as original r1 value and loading from r1 +40/48. I will investigate which compiler pass moved this value, meanwhile @yonghong-song https://github.com/yonghong-song agreed to look at the verifier side.

— Reply to this email directly or view it on GitHub https://github.com/iovisor/bcc/issues/235.