Closed vext01 closed 3 years ago
LLVM doesn't offer an API to serialise/deserialise MBBs
Could / should we add one?
Anything's possible I guess. But it will take time to implement, and it had better be worthwhile if we are to diverge further from LLVM upstream.
On a related note, another scenario under which we see "repeated basic blocks" is function calls. Since a LLVM call
doesn't terminate a block, you see the same block again after returning. We've had to hack around that in our code, but using an MBB-derived IR would simplify this scenario too I think.
OK, so what I'd like to do is find out next is if:
@ptersilie any thoughts?
Yes, working on MBBs instead of LLVM IR was one of the things we've discussed yesterday, but it's no small undertaking. Since we have some ideas to work around the switch-problem, my immediate thoughts are to try that first and see how far we get. I wouldn't be surprised if there are other unexpected codegen shenanigans that make mapping back to IR difficult. Ideally, we have a 1-1 mapping from machine code to something we can work with.
Another problem I see with working with MBB CFGs is how much we can still optimise those. The reason we decided to work on LLVM IR was that we can construct a trace and then pass it to LLVM for optimisation. I'm unsure how much that is possible if we work on different IR.
I'm unsure how much that is possible if we work on different IR.
This is a very good point.
Would creating and adding a switch lowering pass at the end of the llvm ir passes work? And maybe disable or restrict jump threading if necessary?
Hi @bjorn3!
The thought did cross my mind. The problem is, we don't know how many other IR instructions might have hidden control flow, and we may end up with many passes. It would be nice to have a small and general fix.
TLDR: A single instruction in the final (i.e. post-platform-specific passes) LLVM IR can generate many machine blocks and this can lead to multiple interpretations of a trace through that instruction.
To understand this requires a lot of details. Sorry!
I discovered this with the following test:
This gives the following IR at
-O3
:What's important here is the use of
switch
to dispatch to the correct case:When we trace this, we see the following blocks indices reported:
0, 0, 0, 1, 5
:^ These debug prints show the addresses being queried and the IR block(s) that each address range corresponds with (
corr_bbs
). The block numbers are indices, so that's actually the entry block (three times) then the block labeled9:
followed by the block labeled15:
.We can see that several block all mapped to BB0. Why? It's to do with the machine blocks that the LLVM
switch
statement decomposed to.Disassembling the binary we see:
Although the test checks the switch cases in the order
1, 2, 3, default
, LLVM is free to, and has, re-ordered the checks. It tests3, 2, 1, default
using cascadingcmp
andjxx
instructions.In the eyes of PT, a
jxx
terminates a block, so whenx=1
we pass through the blocks for testing the3
and2
cases before eventually taking the1
case. So that explains why we see block0
appearing three times.The block map (in the
.llvm_bb_addr_map
section) confirms that LLVM treats each of these micro-blocks as a distinct machine block:OK, so far so good. What's the problem?
What's bothering me about this is that a block sequence
0, 0, 0
is ambiguous. It either expresses progression through MBBs which all correspond with the BB, or it expresses that BB0 was run three times.One potential fix is to pair each BB number in our trace with an MBB address. So perhaps our trace is instead of
0, 0, 0
, rather(0, 0x123), (0, 0x456), (0, 0x789)
, where the0x
numbers are the addresses of the machine blocks. By doing this we can clearly see that BB0 has not been executed thrice.What I'm thinking about now is whether this fix is sufficient under the assumption that any IR instruction could decompose to arbitrary control flow (perhaps including loops and unstructured jumps). We only consider a block "re-executed" if there's a jump to the address of the first MBB in the block.
Any thoughts?
[Side note: Since @ptersilie moved the IR encoding stage later, I did not expect there to still be a one-to-many mapping from a "final" IR block to it's machine blocks. I thought that the code-gen would canonicalise the control flow in such a way to make a 1-to-1 mapping. Clearly this is not the case. Therefore, in an ideal world, we'd serialise the machine basic blocks and use those as our IR, but this would be highly impractical, as LLVM doesn't offer an API to serialise/deserialise MBBs, I don't think]