Open Philogy opened 5 days ago
yea, i think this looks like a neat idea. we will need a heuristic for when to inline. maybe this can be done as part of SimplifyCFG
? right now we have the invariant that basic blocks do not simultaneously have multiple cfg_in
and multiple cfg_out
(NormalizationPass
). i assume we can get rid of the invariant actually, but it might need some rework in venom_to_assembly.py
.
paging @harkal, what do you think?
@Philogy do you want to take a stab at this?
fwiw, one of those jumps gets eliminated
and the corresponding assembly:
_sym_internal 0 _build_domain_separator()_runtime JUMPDEST SWAP1 PUSH32
0x8b73c3c69bb8fe3d512ecc4cf759cc79239f7b179b0ffacaa9a75d522b39400f
PUSH1 0x60 MSTORE PUSH1 0x20 _OFST _sym_code_end 192 PUSH1 0x80
CODECOPY PUSH1 0x20 _OFST _sym_code_end 288 PUSH1 0xa0 CODECOPY
CHAINID PUSH1 0xc0 MSTORE ADDRESS PUSH1 0xe0 MSTORE PUSH1 0xa0
PUSH1 0x40 MSTORE PUSH1 0x40 MLOAD PUSH1 0x60 SHA3 SWAP1 MSTORE
JUMP _sym_internal 1 _domain_separator_v4()_runtime JUMPDEST SWAP1
ADDRESS PUSH1 0x20 _OFST _sym_code_end 64 PUSH0 CODECOPY PUSH0 MLOAD
XOR _sym_7_else JUMPI CHAINID PUSH1 0x20 _OFST _sym_code_end 32
PUSH0 CODECOPY PUSH0 MLOAD XOR ISZERO _sym_8_if_exit JUMPDEST ISZERO
_sym_11_if_exit JUMPI PUSH1 0x20 SWAP1 _OFST _sym_code_end 0 SWAP1
CODECOPY _sym_internal 1 _domain_separator_v4()_cleanup JUMPDEST JUMP
_sym_11_if_exit JUMPDEST PUSH2 0x0100 _sym_label_ret_3 _sym_internal
0 _build_domain_separator()_runtime JUMP _sym_label_ret_3
JUMPDEST PUSH2 0x0100 MLOAD SWAP1 MSTORE _sym_internal 1
_domain_separator_v4()_cleanup JUMP _sym_7_else JUMPDEST PUSH0
_sym_8_if_exit JUMP
the tradeoff here looks like two JUMPIs terminating the basic blocks vs one basic block with no ending JUMP but one with two JUMPs:
Simple Summary
Add an optimization pass that "inlines" small basic blocks. Similar to function inlining it allows following passes to simplify things even further and remove redundancy.
For example the following code (from snekmate):
Generates:
`_domain_separator_v4` in Venom IR
``` IRFunction: internal 2 _domain_separator_v4()_runtime internal 2 _domain_separator_v4()_runtime: IN=[] OUT=[68_then, 69_else] => {} %1 = paramMotivation
The optimizer pass would:
jmp label %70_if_exit
in68_then
, resolving%14 = %12
jmp label %70_if_exit
in69_else
, resolving%14 = 0
Resuling in:
Theoretical Venom IR post BB inlining pass
```diff IRFunction: internal 2 _domain_separator_v4()_runtime internal 2 _domain_separator_v4()_runtime: IN=[] OUT=[68_then, 69_else] => {} %1 = paramThen it would be trivial for further optimization passes to:
70_if_exit
blockjnz label %73_if_exit, label %71_then, 0
tojmp label &71_then
in69_else
69_else
with71_then
in the function entry block (because it would be an empty BB only containing a single unconditional jump)This would lead to smaller code and remove a lot of redundant jumps and stack scheduling. Some heuristics need to be added to such a pass to avoid unsound optimization, specifically if the target basic block has any phi nodes who's lifetime extends outside of that basic block it cannot be trivially inlined. Furthermore inlining large target basic blocks could lead to code size regressions.
The
69_else
and70_if_exit
blocks are quite redundant and a common pattern arising from the short-circuiting ofor
andand
operations as well as other control-flow related to if-statements. Furthermore it would allow optimizing for loops with the rough common structure of:To:
By inlining the for-loop condition checking basic block at the end of the loop, turning the 2 branch operations per loop iteration to 1.
Copyright
Copyright and related rights waived via CC0