Open nuclight opened 2 months ago
I didn't try reproducer, but from your description. This is not a clang optimizer bug. The generated code should be correct but the verifier cannot properly derive the value range properly due to verifier limitation. See bpf selftests where barrier_var() macro has been used in various bpf progs to workaround verification issue.
Yes, I was able to workaround with barrier_var()
macro in this particular program, but I'm afraid this way is too fragile for more complex programs in production. Also, about verifier limitation - is it even possible to be changed in verifier? If not, then problem is not on the kernel side, i.e. only clang could solve this. And:
This is not a clang optimizer bug
Please take a closer look to instructions 249-256 in disassembler output (second paste in this issue, searchable on this web page by "nuc_ts_prog_kern.c:190" substring). You'll see that r9
is not only thrown away after comparison, the sequence
r0 = 300
if r9 > r2 goto <LBB0_103>
is repeated twice! But there is no some label to jump in between so that such repetition could be justified... So this code, while formally being correct in the sense of "produced output" (if we ignore verifier), is clearly not optimal - it does unnecessary actions, which will negatively affect performance. Thus, it may be for other cases where "it's verifier problem, not clang", but for this particular case I insist that this is optimizer bug.
I've seen #4612 with similar problem, if this is not right repo, please point where to report this. Essentially this is a repost of my mail to xdp-newbies@vger.kernel.org
I am going via https://github.com/xdp-project/xdp-tutorial and after 3rd lesson trying to create a simple program for searching TCP timestamp option and incrementing it by one. However, after two dozen tries eBPF verifier still doesn't accept my code. I was digging into verifier sources and found that access ("
r=123
") propagates to all registers with the same "id=
" after comparison withpkt_end
(usuallydata_end
variable in xdp-tutorial). So I learned to place such checks as late as possible to actual access, but this still did not help.But carefully and boringly going via disassembler and verifier output, I've found that clang optimizer ignores just checked register, and calculates from another, raising verifier's complain. Moreover, it generates not optimal code sometimes! In my (currently) last try, this looks like:
and in disassembler (
llvm-objdump -lS nuc_ts_prog_kern.o
), with my comments added:As you can see, it checks
r9
even twice, andr9
gets desired "r=44
", but then discardsr9
and usesr0
with no such checks from optimiser.I tried to patch this manually:
first, doing
llc -march=bpf -filetype=asm -o nuc_ts_prog_kern.s nuc_ts_prog_kern.ll
and getting actual object file:
llvm-mc -triple bpf -filetype=obj -o nuc_ts_prog_kern.o nuc_ts_prog_kern.s
then after this eBPF verifier accepts and loads program into kernel!
So, the questions are:
1) could this be considered a clang bug? 2) even if "yes" to first question, what could be done to trick clang into emitting more "correct" (from verifier's POV) code?
How to reproduce:
This is Ubuntu 22.04, kernel 6.5.0-35, clang 14.
Do
and initial lib setup by their README, then put .h and .c file directly into this directory. Compile with:
The sources:
$ cat nuc_ts_common_kern_user.h
$ cat nuc_ts_prog_kern.c