Open jfecher opened 7 months ago
I've confirmed that SSA increases linearly by 14 extra instructions each time. The extra instructions are (in order):
2x add
4x mul
4x add then mul
(The last iteration of the loop only has 2x add then mul since the last two modify array indices 1 & 2 which are unused in main and are thus eliminated by DIE)
So this looks like an Acir-gen issue.
I've confirmed this is unrelated to array copying in Acir. This makes sense since no arrays are modified.
For reference, the SSA for a run with 3 iterations looks like this:
acir fn main f0 {
b0(v0: Field, v1: Field):
// permute iteration 1
v770 = add v0, v1
v775 = add v0, Field -5512639236676924786014155380588025764096985869754559557744523208552434701087
v776 = mul v775, v775
v777 = mul v776, v776
v778 = mul v775, v777
v779 = mul v778, Field 7511745149465107256748700652201246547602992235352608707588321460060273774987
v780 = add v778, v779
v781 = mul v1, Field 1781874611967874592137274483616240894881315449294815307306613366069350853425
v782 = add v780, v781
v783 = mul v770, Field 9676220459425127104563807626505378474104527268335041816433595157913150665495
v784 = add v782, v783
v785 = mul v784, Field 9676220459425127104563807626505378474104527268335041816433595157913150665495
v786 = add v1, v785
v787 = mul v784, Field 8364259238812534287689210722577399963878179320345509803468849104367466297989
// iteration 2
v788 = add v770, v787
v789 = add v784, Field -2⁴×387493572659975021470226716906358013055803406560817195243522295635460328127
v790 = mul v789, v789
v791 = mul v790, v790
v792 = mul v789, v791
v793 = mul v792, Field 7511745149465107256748700652201246547602992235352608707588321460060273774987
v794 = add v792, v793
v795 = mul v786, Field -6684379154708237978759272567577041337887670303253204317176013706256788968730
v796 = add v794, v795
v797 = mul v788, Field 1645017323598148583308153743253948043010266295265950623794066679542803673813
v798 = add v796, v797
v799 = mul v798, Field 1645017323598148583308153743253948043010266295265950623794066679542803673813
v800 = add v786, v799
v801 = mul v798, Field -6902316737387657021175622823110739310551009794185512225013048131011375572021
// iteration 3
v802 = add v788, v801
v803 = add v798, Field 2⁴×593199705942557006050540024999671509991229882396447677974095356964325900814
v804 = mul v803, v803
v805 = mul v804, v804
v806 = mul v803, v805
v807 = mul v806, Field 7511745149465107256748700652201246547602992235352608707588321460060273774987
v808 = add v806, v807
v809 = mul v800, Field -3778477114939312735135329793763823326274743295264527892924859844466140293618
v810 = add v808, v809
v811 = mul v802, Field 8034324828084400593020431506480243533881627849088152439427470035355284392177
v812 = add v810, v811
// main
v817 = eq v812, v1
constrain v817 == u1 0
return
}
~The issue results from us not reusing intermediate variables optimally during circuit transformation:~
Doesn't seem like this is the case. Keeping this here for reference however.
// Untransformed
Compiled ACIR for main (unoptimized):
func 0
current witness index : 16
public parameters indices : [1]
return value indices : []
EXPR [ (1, _0) (-1, _2) -5512639236676924786014155380588025764096985869754559557744523208552434701087 ]
EXPR [ (1, _2, _2) (-1, _3) 0 ]
EXPR [ (1, _3, _3) (-1, _4) 0 ]
EXPR [ (7511745149465107256748700652201246547602992235352608707588321460060273774988, _2, _4) (9676220459425127104563807626505378474104527268335041816433595157913150665495, _0) (11458095071393001696701082110121619368985842717629857123740208523982501518920, _1) (-1, _5) -2⁴×387493572659975021470226716906358013055803406560817195243522295635460328127 ]
EXPR [ (1, _5, _5) (-1, _6) 0 ]
EXPR [ (1, _6, _6) (-1, _7) 0 ]
EXPR [ (10302764019366810656324304069659347464634037313761163346014007424803634884116, _2, _4) (7511745149465107256748700652201246547602992235352608707588321460060273774988, _5, _7) (6604154496063676802804756384484347298797004655271010100320031032419788549325, _0) (4105605326965746781005342339392760270064105286305864323348731164214007389462, _1) (-1, _8) 2⁴×593199705942557006050540024999671509991229882396447677974095356964325900814 ]
EXPR [ (1, _8, _8) (-1, _9) 0 ]
EXPR [ (1, _9, _9) (-1, _10) 0 ]
EXPR [ (2056738702329958338528069809778020848023814166851283194299370842509476243522, _2, _4) (692246069124406706574478606975526033386351398548558144542515845293232943739, _5, _7) (7511745149465107256748700652201246547602992235352608707588321460060273774988, _8, _10) (-4945605217586328132698244186628229593098542664068970726496460897751429780929, _0) (6377772023842086820811309884692226915529324425293982843669511452955944766649, _1) (-1, _11) 9797453712978351739894993124526343599910864939600507506817907398049628087845 ]
EXPR [ (1, _11, _11) (-1, _12) 0 ]
EXPR [ (1, _12, _12) (-1, _13) 0 ]
EXPR [ (8244951158953924849441820526537414617456405719694233716653789536929031657217, _2, _4) (1245000739171333043151662300492364762247881201867951700385117535552263642718, _5, _7) (-8866078365483905008300628909657660427368986553774165888273494658969433806135, _8, _10) (-7511745149465107256748700652201246547602992235352608707588321460060273774988, _11, _13) (7538040645284361082572458070655519332106145662083850722888221131624634527860, _0) (4089505136743306378354241962475825377338462365227913938936236421229037646862, _1) (-1, _14) 0 ]
BRILLIG: inputs: [Single(Expression { mul_terms: [], linear_combinations: [(1, Witness(14))], q_c: 0 })]
outputs: [Simple(Witness(15))]
[CalldataCopy { destination_address: MemoryAddress(0), size: 1, offset: 0 }, JumpIfNot { condition: MemoryAddress(0), location: 3 }, Const { destination: MemoryAddress(1), bit_size: 254, value: Value { inner: 1 } }, BinaryFieldOp { destination: MemoryAddress(0), op: Div, lhs: MemoryAddress(1), rhs: MemoryAddress(0) }, Stop { return_data_offset: 0, return_data_size: 1 }]
EXPR [ (1, _14, _15) (1, _16) -1 ]
EXPR [ (1, _14, _16) 0 ]
EXPR [ (1, _16) 0 ]
// transformed
current witness index : 44
public parameters indices : [1]
return value indices : []
EXPR [ (1, _0) (-1, _2) -5512639236676924786014155380588025764096985869754559557744523208552434701087 ]
EXPR [ (1, _2, _2) (-1, _3) 0 ]
EXPR [ (1, _3, _3) (-1, _4) 0 ]
EXPR [ (7511745149465107256748700652201246547602992235352608707588321460060273774988, _2, _4) (-1, _17) 0 ]
EXPR [ (9676220459425127104563807626505378474104527268335041816433595157913150665495, _0) (11458095071393001696701082110121619368985842717629857123740208523982501518920, _1) (-1, _18) 0 ]
EXPR [ (-1, _5) (1, _17) (1, _18) -2⁴×387493572659975021470226716906358013055803406560817195243522295635460328127 ]
EXPR [ (1, _5, _5) (-1, _6) 0 ]
EXPR [ (1, _6, _6) (-1, _7) 0 ]
EXPR [ (7511745149465107256748700652201246547602992235352608707588321460060273774988, _5, _7) (-1, _21) 0 ]
EXPR [ (6604154496063676802804756384484347298797004655271010100320031032419788549325, _0) (4105605326965746781005342339392760270064105286305864323348731164214007389462, _1) (-1, _22) 0 ]
EXPR [ (11478095976109321424005083581332697351833154779671800717176968483932168473324, _17) (1, _21) (-1, _23) 0 ]
EXPR [ (-1, _8) (1, _22) (1, _23) 2⁴×593199705942557006050540024999671509991229882396447677974095356964325900814 ]
EXPR [ (1, _8, _8) (-1, _9) 0 ]
EXPR [ (1, _9, _9) (-1, _10) 0 ]
EXPR [ (7511745149465107256748700652201246547602992235352608707588321460060273774988, _8, _10) (-1, _27) 0 ]
EXPR [ (-4945605217586328132698244186628229593098542664068970726496460897751429780929, _0) (6377772023842086820811309884692226915529324425293982843669511452955944766649, _1) (-1, _28) 0 ]
EXPR [ (-8171050935427428633424824527321522461511072821809794285567532217425125208927, _17) (-5514733471089785794374270767163942360219071296880559339102618199968142537142, _21) (-1, _29) 0 ]
EXPR [ (1, _27) (1, _28) (-1, _30) 0 ]
EXPR [ (-1, _11) (1, _29) (1, _30) 9797453712978351739894993124526343599910864939600507506817907398049628087845 ]
EXPR [ (1, _11, _11) (-1, _12) 0 ]
EXPR [ (1, _12, _12) (-1, _13) 0 ]
EXPR [ (-7511745149465107256748700652201246547602992235352608707588321460060273774988, _11, _13) (-1, _35) 0 ]
EXPR [ (7538040645284361082572458070655519332106145662083850722888221131624634527860, _0) (4089505136743306378354241962475825377338462365227913938936236421229037646862, _1) (-1, _36) 0 ]
EXPR [ (-6379787701423272764930233206555704103213006823652545026586546074778741051842, _17) (-6528941988757818671164751416148170116685212390218786282564565476950079182897, _21) (-1, _37) 0 ]
EXPR [ (-4913523097929112444230457750999582340681777120025624541356118512330689264044, _27) (1, _35) (-1, _38) 0 ]
EXPR [ (1, _36) (1, _37) (-1, _39) 0 ]
EXPR [ (-1, _14) (1, _38) (1, _39) 0 ]
BRILLIG: inputs: [Single(Expression { mul_terms: [], linear_combinations: [(1, Witness(14))], q_c: 0 })]
outputs: [Simple(Witness(15))]
[CalldataCopy { destination_address: MemoryAddress(0), size: 1, offset: 0 }, JumpIfNot { condition: MemoryAddress(0), location: 3 }, Const { destination: MemoryAddress(1), bit_size: 254, value: Value { inner: 1 } }, BinaryFieldOp { destination: MemoryAddress(0), op: Div, lhs: MemoryAddress(1), rhs: MemoryAddress(0) }, Stop { return_data_offset: 0, return_data_size: 1 }]
EXPR [ (1, _14, _15) (1, _16) -1 ]
EXPR [ (1, _14, _16) 0 ]
EXPR [ (1, _16) 0 ]
+---------------+----------------------+--------------+----------------------+
| Package | Expression Width | ACIR Opcodes | Backend Circuit Size |
+---------------+----------------------+--------------+----------------------+
| test_poseidon | Bounded { width: 3 } | 31 | 36 |
+---------------+----------------------+--------------+----------------------+
In the happy case of the initial state being comprised on witnesses only then the number of opcodes required for each iteration should be something like this:
let t = s0+const; // 1 opcode
let tttt = t^4; // 2 opcodes (one for each square)
let a = s0 * tttt // 1 opcode
let new_s0 = a + s1*const + s2*const // 1 opcode
let new_s1 = s1 + new_s0 * const // 1 opcode
let new_s2 = s1 + new_s0 * const // 1 opcode
We should have a constant 7 opcodes per iteration
My gut feeling is that the issue is due how we're laying down constraints lazily we're accumulating more and more quadratic terms inside s1
and s2
and we're not recognising that we have a common value of new_s0
being added into it which should be simplified. We're then doing a lot of recalculation because of this.
This can be seen as if we use the loop below (note the large number of iterations plus the range opcodes being used to enforce that the state is always represented as witnesses) I get a circuit size of 556. Commenting out these range opcodes causes the circuit to grow to 1480.
for _r in 0..50 {
s0 += C[((rf as u64 / 2)+1)*state.len() + _r as u64];
let t = s0;
let tt = t * t;
let tttt = tt * tt;
s0 *= tttt;
s0 += s0 * S[(5)*_r ];
s0 += s1 * S[(5)*_r +1];
s0 += s2 * S[(5)*_r +2];
s1 += s0 * S[5 * _r +2];
s2 += s0 * S[5*_r + 3];
s1.assert_max_bit_size(253);
s2.assert_max_bit_size(253);
}
I also get a constant difference of 11 opcodes if I change the number of loop iterations in this case. Removing 2 for the range opcodes, this doesn't quite line up with the expected 7 but :shrug:
@TomAFrench Going from 1 to 2 iterations in the original code I see a change of 6 opcodes, which lines up with Zac's original estimate.
I also just finished testing acir-gen/Ssa::into_acir
and found that that does increase by a constant 3 additional opcodes each iteration when going from 1 iteration to 2 to 3. So acir-gen seems fine here.
Sounds like we're missing a way to factor out common instances of new_s0
then? Or maybe we just can't lay down constraints lazily anymore.
Sounds like we're missing a way to factor out common instances of new_s0 then?
It's kinda yucky as it exposes knowledge about circuits to the end user, but yeah it does seem that a force_witness(expr)
would be useful for some of these algorithms.
Or maybe we just can't lay down constraints lazily anymore.
I don't think so, we're always going to need some laziness as emitting an acir opcode per SSA instruction will mean that we'll never fill up a plonk gate. I'd need to think on if we can identify variables such as s1
and s2
automatically.
I've confirmed the extra constraints are being pushed here: https://github.com/noir-lang/noir/blob/master/acvm-repo/acvm/src/compiler/transformers/mod.rs#L85-L96, 3 for the first iter, 4 for the second, then 5, etc.
@vezenovm do you want to take a look at making ACIR gen aware of the expression width?
@vezenovm do you want to take a look at making ACIR gen aware of the expression width?
Yeah happy to take this on.
Reopening this since the flag introduced by #5493 is not enabled by default
Aim
Testing the code in https://github.com/zac-williamson/noir-poseidon1-array-bug.
Expected Behavior
Expected the acir opcode count and backend circuit size to increase by a constant
6
for each additional iteration of the loop inposeidon::bn254::permute
.Bug
Actual change in acir and backend circuit size for iteration total
N
is:To Reproduce
1. 2. 3. 4.
Project Impact
None
Impact Context
No response
Workaround
None
Workaround Description
No response
Additional Context
No response
Installation Method
None
Nargo Version
No response
NoirJS Version
No response
Would you like to submit a PR for this Issue?
None
Support Needs
No response