Open vezenovm opened 2 months ago
Generated at commit: f637f27cdf7bde3c5e23dbaea72299b285aefe39, compared to commit: 619c5451b152d62e01d3c4c1da7e13ff6502f915
Program | Brillig opcodes (+/-) | % |
---|---|---|
brillig_rc_regression_6123 | +19 β | +11.24% |
sha256 | +63 β | +3.39% |
sha256_var_witness_const_regression | +43 β | +3.38% |
sha256_var_padding_regression | +168 β | +3.34% |
keccak256 | +52 β | +3.04% |
brillig_keccak | +52 β | +3.04% |
6_array | -14 β | -3.55% |
array_to_slice | -51 β | -5.62% |
reference_only_used_as_alias | +7 β | +2.81%
Hmm this is surprising. I'm guessing that removing some of these loads is perhaps reducing the amount of trivial stores we can remove, but I'm not sure.
reference_only_used_as_alias | +7 β | +2.81%
Hmm this is surprising. I'm guessing that removing some of these loads is perhaps reducing the amount of trivial stores we can remove, but I'm not sure.
The main difference between the SSA on master and this PR looks to be the inc_rc
instructions remaining in place. Before mem2reg
we have this pattern:
v61 = load v52
inc_rc v61
inc_rc [u8 0, u8 0, u8 0, u8 0, u8 0]
inc_rc [u8 0, u8 0, u8 0, u8 0, u8 0]
v64 = load v51
inc_rc v64
v65 = load v52
inc_rc v65
v66 = load v52
inc_rc v66
v67 = load v52
v68 = load v53
v70 = lt v68, u32 4
constrain v70 == u1 1 '"push out of bounds"'
v72 = load v52
v73 = load v53
v74 = load v52
v75 = load v53
v76 = mul v75, u32 4
v77 = array_set v72, index v76, value Field 0
The repeat loads are removed in this PR, but those follow-up inc_rc instructions remain. I think this can be handled in a follow-up though so I am marking this PR ready for review again.
I tried this with the aliasing test in https://github.com/noir-lang/noir/issues/6120 and compared to master there's another regression. The output with this PR is:
acir(inline) fn main f0 {
b0(v3: u1):
jmpif v3 then: b1, else: b2
b1():
v11 = allocate
jmp b3(v3, v3)
b3(v6: &mut Field, v7: &mut Field):
v12 = load v6
store Field 2 at v7
constrain v12 == Field 0
constrain v12 == Field 2
return
b2():
v10 = allocate
jmp b3(v3, v3)
}
The results of both constrains are replaced with v12
even though they should be different values due to the store Field 2 at v7
where v7
aliases v6
. Originally the == FIeld 2
was done on a load that came after the store, which should then also hold 2.
Original example from #6120 had a typo. After fixing the typo it works on master but is still seeing a regression in this PR. Here's the final output:
acir(inline) fn main f0 {
b0(v3: u1):
jmpif v3 then: b1, else: b2
b1():
v11 = allocate
store Field 0 at v11
jmp b3(v11, v11)
b3(v6: &mut Field, v7: &mut Field):
v12 = load v6
store Field 2 at v7
constrain v12 == Field 0
constrain v12 == Field 2
return
b2():
v10 = allocate
store Field 1 at v10
jmp b3(v10, v10)
}
I've fixed the typo in the issue so that this can be reproduced
I tried this with the aliasing test in #6120 and compared to master there's another regression
I have switched keep_repeat_loads_with_alias_store
to the test in the issue, but I added an extra parameter to b3
to also check that we are accounting for parameters with more just one other alias.
brillig_rc_regression_6123 +19 β +10.16% fold_2_to_17 +42 β +6.89% bench_2_to_17 +21 β +6.60% fold_numeric_generic_poseidon +51 β +6.57% no_predicates_numeric_generic_poseidon +51 β +6.57% poseidon2 +21 β +6.54%
Looks like we are getting various regressions now after the RC correctness fix. Going to table this PR for now, and look at further optimizing RC instruction removals .
Description
Problem*
Resolves
Part of general effort to improve mem2reg.
Summary*
We sometimes have situations such as the following:
v2
does not have a known value, thus we do not remove the load. The mem2reg pass is acting as expected here. However, without a store or call to the reference betweenv11 = load v2
andv12 = load v2
we should be able to safely removev12 = load v2
and mapv12 -> v11
.This PR adds this logic as part of the initial mem2reg pass. We have a new
last_loads
map as part of aBlock
. This is currently cleared after analyzing block and is meant to only be per block. Unifying these last loads across blocks and the accurate predecessors can come in a follow-up. This is an initial proof of concept to show the optimizations validity.Given an instruction we act as following:
Load
Store
Call
I have also added two unit tests to
mem2reg.rs
Additional Context
Documentation*
Check one:
PR Checklist*
cargo fmt
on default settings.