Acir size increases exponentially as loop iterations increase, instead of linearly

jfecher commented 7 months ago

Aim

Testing the code in https://github.com/zac-williamson/noir-poseidon1-array-bug.

Expected Behavior

Expected the acir opcode count and backend circuit size to increase by a constant 6 for each additional iteration of the loop in poseidon::bn254::permute.

Bug

Actual change in acir and backend circuit size for iteration total N is:

// N | Acir size | Acir change | Backend Circuit Size | circuit change
// 1 | 10        |             | 15                   |
// 2 | 16        | 6           | 21                   | 6
// 3 | 23        | 7           | 28                   | 7
// 4 | 31        | 8           | 36                   | 8
// 5 | 40        | 9           | 45                   | 9
// 6 | 50        | 10          | 55                   | 10
// 7 | 61        | 11          | 66                   | 11
// 8 | 73        | 12          | 78                   | 12
// 9 | 86        | 13          | 91                   | 13
// 10| 100       | 14          | 105                  | 14
// 11| 115       | 15          | 120                  | 15
// 12| 131       | 16          | 136                  | 16
// 13| 148       | 17          | 153                  | 17

To Reproduce

1. 2. 3. 4.

Project Impact

None

Impact Context

No response

Workaround

None

Workaround Description

No response

Additional Context

No response

Installation Method

None

Nargo Version

No response

NoirJS Version

No response

Would you like to submit a PR for this Issue?

None

Support Needs

No response

jfecher commented 7 months ago

I've confirmed that SSA increases linearly by 14 extra instructions each time. The extra instructions are (in order):

2x add
4x mul
4x add then mul

(The last iteration of the loop only has 2x add then mul since the last two modify array indices 1 & 2 which are unused in main and are thus eliminated by DIE)

So this looks like an Acir-gen issue.

jfecher commented 7 months ago

I've confirmed this is unrelated to array copying in Acir. This makes sense since no arrays are modified.

jfecher commented 7 months ago

For reference, the SSA for a run with 3 iterations looks like this:

acir fn main f0 {
  b0(v0: Field, v1: Field):
    // permute iteration 1
    v770 = add v0, v1
    v775 = add v0, Field -5512639236676924786014155380588025764096985869754559557744523208552434701087
    v776 = mul v775, v775
    v777 = mul v776, v776
    v778 = mul v775, v777
    v779 = mul v778, Field 7511745149465107256748700652201246547602992235352608707588321460060273774987
    v780 = add v778, v779
    v781 = mul v1, Field 1781874611967874592137274483616240894881315449294815307306613366069350853425
    v782 = add v780, v781
    v783 = mul v770, Field 9676220459425127104563807626505378474104527268335041816433595157913150665495
    v784 = add v782, v783
    v785 = mul v784, Field 9676220459425127104563807626505378474104527268335041816433595157913150665495
    v786 = add v1, v785
    v787 = mul v784, Field 8364259238812534287689210722577399963878179320345509803468849104367466297989

    // iteration 2
    v788 = add v770, v787
    v789 = add v784, Field -2⁴×387493572659975021470226716906358013055803406560817195243522295635460328127
    v790 = mul v789, v789
    v791 = mul v790, v790
    v792 = mul v789, v791
    v793 = mul v792, Field 7511745149465107256748700652201246547602992235352608707588321460060273774987
    v794 = add v792, v793
    v795 = mul v786, Field -6684379154708237978759272567577041337887670303253204317176013706256788968730
    v796 = add v794, v795
    v797 = mul v788, Field 1645017323598148583308153743253948043010266295265950623794066679542803673813
    v798 = add v796, v797
    v799 = mul v798, Field 1645017323598148583308153743253948043010266295265950623794066679542803673813
    v800 = add v786, v799
    v801 = mul v798, Field -6902316737387657021175622823110739310551009794185512225013048131011375572021

    // iteration 3
    v802 = add v788, v801
    v803 = add v798, Field 2⁴×593199705942557006050540024999671509991229882396447677974095356964325900814
    v804 = mul v803, v803
    v805 = mul v804, v804
    v806 = mul v803, v805
    v807 = mul v806, Field 7511745149465107256748700652201246547602992235352608707588321460060273774987
    v808 = add v806, v807
    v809 = mul v800, Field -3778477114939312735135329793763823326274743295264527892924859844466140293618
    v810 = add v808, v809
    v811 = mul v802, Field 8034324828084400593020431506480243533881627849088152439427470035355284392177
    v812 = add v810, v811

    // main
    v817 = eq v812, v1
    constrain v817 == u1 0
    return 
}

TomAFrench commented 7 months ago

~The issue results from us not reusing intermediate variables optimally during circuit transformation:~

Doesn't seem like this is the case. Keeping this here for reference however.

// Untransformed
Compiled ACIR for main (unoptimized):
func 0
current witness index : 16
public parameters indices : [1]
return value indices : []
EXPR [ (1, _0) (-1, _2) -5512639236676924786014155380588025764096985869754559557744523208552434701087 ]
EXPR [ (1, _2, _2) (-1, _3) 0 ]
EXPR [ (1, _3, _3) (-1, _4) 0 ]
EXPR [ (7511745149465107256748700652201246547602992235352608707588321460060273774988, _2, _4) (9676220459425127104563807626505378474104527268335041816433595157913150665495, _0) (11458095071393001696701082110121619368985842717629857123740208523982501518920, _1) (-1, _5) -2⁴×387493572659975021470226716906358013055803406560817195243522295635460328127 ]
EXPR [ (1, _5, _5) (-1, _6) 0 ]
EXPR [ (1, _6, _6) (-1, _7) 0 ]
EXPR [ (10302764019366810656324304069659347464634037313761163346014007424803634884116, _2, _4) (7511745149465107256748700652201246547602992235352608707588321460060273774988, _5, _7) (6604154496063676802804756384484347298797004655271010100320031032419788549325, _0) (4105605326965746781005342339392760270064105286305864323348731164214007389462, _1) (-1, _8) 2⁴×593199705942557006050540024999671509991229882396447677974095356964325900814 ]
EXPR [ (1, _8, _8) (-1, _9) 0 ]
EXPR [ (1, _9, _9) (-1, _10) 0 ]
EXPR [ (2056738702329958338528069809778020848023814166851283194299370842509476243522, _2, _4) (692246069124406706574478606975526033386351398548558144542515845293232943739, _5, _7) (7511745149465107256748700652201246547602992235352608707588321460060273774988, _8, _10) (-4945605217586328132698244186628229593098542664068970726496460897751429780929, _0) (6377772023842086820811309884692226915529324425293982843669511452955944766649, _1) (-1, _11) 9797453712978351739894993124526343599910864939600507506817907398049628087845 ]
EXPR [ (1, _11, _11) (-1, _12) 0 ]
EXPR [ (1, _12, _12) (-1, _13) 0 ]
EXPR [ (8244951158953924849441820526537414617456405719694233716653789536929031657217, _2, _4) (1245000739171333043151662300492364762247881201867951700385117535552263642718, _5, _7) (-8866078365483905008300628909657660427368986553774165888273494658969433806135, _8, _10) (-7511745149465107256748700652201246547602992235352608707588321460060273774988, _11, _13) (7538040645284361082572458070655519332106145662083850722888221131624634527860, _0) (4089505136743306378354241962475825377338462365227913938936236421229037646862, _1) (-1, _14) 0 ]
BRILLIG: inputs: [Single(Expression { mul_terms: [], linear_combinations: [(1, Witness(14))], q_c: 0 })]
outputs: [Simple(Witness(15))]
[CalldataCopy { destination_address: MemoryAddress(0), size: 1, offset: 0 }, JumpIfNot { condition: MemoryAddress(0), location: 3 }, Const { destination: MemoryAddress(1), bit_size: 254, value: Value { inner: 1 } }, BinaryFieldOp { destination: MemoryAddress(0), op: Div, lhs: MemoryAddress(1), rhs: MemoryAddress(0) }, Stop { return_data_offset: 0, return_data_size: 1 }]

EXPR [ (1, _14, _15) (1, _16) -1 ]
EXPR [ (1, _14, _16) 0 ]
EXPR [ (1, _16) 0 ]

// transformed

current witness index : 44
public parameters indices : [1]
return value indices : []
EXPR [ (1, _0) (-1, _2) -5512639236676924786014155380588025764096985869754559557744523208552434701087 ]
EXPR [ (1, _2, _2) (-1, _3) 0 ]
EXPR [ (1, _3, _3) (-1, _4) 0 ]
EXPR [ (7511745149465107256748700652201246547602992235352608707588321460060273774988, _2, _4) (-1, _17) 0 ]
EXPR [ (9676220459425127104563807626505378474104527268335041816433595157913150665495, _0) (11458095071393001696701082110121619368985842717629857123740208523982501518920, _1) (-1, _18) 0 ]
EXPR [ (-1, _5) (1, _17) (1, _18) -2⁴×387493572659975021470226716906358013055803406560817195243522295635460328127 ]
EXPR [ (1, _5, _5) (-1, _6) 0 ]
EXPR [ (1, _6, _6) (-1, _7) 0 ]
EXPR [ (7511745149465107256748700652201246547602992235352608707588321460060273774988, _5, _7) (-1, _21) 0 ]
EXPR [ (6604154496063676802804756384484347298797004655271010100320031032419788549325, _0) (4105605326965746781005342339392760270064105286305864323348731164214007389462, _1) (-1, _22) 0 ]
EXPR [ (11478095976109321424005083581332697351833154779671800717176968483932168473324, _17) (1, _21) (-1, _23) 0 ]
EXPR [ (-1, _8) (1, _22) (1, _23) 2⁴×593199705942557006050540024999671509991229882396447677974095356964325900814 ]
EXPR [ (1, _8, _8) (-1, _9) 0 ]
EXPR [ (1, _9, _9) (-1, _10) 0 ]
EXPR [ (7511745149465107256748700652201246547602992235352608707588321460060273774988, _8, _10) (-1, _27) 0 ]
EXPR [ (-4945605217586328132698244186628229593098542664068970726496460897751429780929, _0) (6377772023842086820811309884692226915529324425293982843669511452955944766649, _1) (-1, _28) 0 ]
EXPR [ (-8171050935427428633424824527321522461511072821809794285567532217425125208927, _17) (-5514733471089785794374270767163942360219071296880559339102618199968142537142, _21) (-1, _29) 0 ]
EXPR [ (1, _27) (1, _28) (-1, _30) 0 ]
EXPR [ (-1, _11) (1, _29) (1, _30) 9797453712978351739894993124526343599910864939600507506817907398049628087845 ]
EXPR [ (1, _11, _11) (-1, _12) 0 ]
EXPR [ (1, _12, _12) (-1, _13) 0 ]
EXPR [ (-7511745149465107256748700652201246547602992235352608707588321460060273774988, _11, _13) (-1, _35) 0 ]
EXPR [ (7538040645284361082572458070655519332106145662083850722888221131624634527860, _0) (4089505136743306378354241962475825377338462365227913938936236421229037646862, _1) (-1, _36) 0 ]
EXPR [ (-6379787701423272764930233206555704103213006823652545026586546074778741051842, _17) (-6528941988757818671164751416148170116685212390218786282564565476950079182897, _21) (-1, _37) 0 ]
EXPR [ (-4913523097929112444230457750999582340681777120025624541356118512330689264044, _27) (1, _35) (-1, _38) 0 ]
EXPR [ (1, _36) (1, _37) (-1, _39) 0 ]
EXPR [ (-1, _14) (1, _38) (1, _39) 0 ]
BRILLIG: inputs: [Single(Expression { mul_terms: [], linear_combinations: [(1, Witness(14))], q_c: 0 })]
outputs: [Simple(Witness(15))]
[CalldataCopy { destination_address: MemoryAddress(0), size: 1, offset: 0 }, JumpIfNot { condition: MemoryAddress(0), location: 3 }, Const { destination: MemoryAddress(1), bit_size: 254, value: Value { inner: 1 } }, BinaryFieldOp { destination: MemoryAddress(0), op: Div, lhs: MemoryAddress(1), rhs: MemoryAddress(0) }, Stop { return_data_offset: 0, return_data_size: 1 }]

EXPR [ (1, _14, _15) (1, _16) -1 ]
EXPR [ (1, _14, _16) 0 ]
EXPR [ (1, _16) 0 ]

+---------------+----------------------+--------------+----------------------+
| Package       | Expression Width     | ACIR Opcodes | Backend Circuit Size |
+---------------+----------------------+--------------+----------------------+
| test_poseidon | Bounded { width: 3 } | 31           | 36                   |
+---------------+----------------------+--------------+----------------------+

TomAFrench commented 7 months ago

In the happy case of the initial state being comprised on witnesses only then the number of opcodes required for each iteration should be something like this:

let t = s0+const; // 1 opcode
let tttt = t^4; // 2 opcodes (one for each square)
let a = s0 * tttt // 1 opcode
let new_s0 = a + s1*const + s2*const // 1 opcode
let new_s1 = s1 + new_s0 * const // 1 opcode
let new_s2 = s1 + new_s0 * const // 1 opcode

We should have a constant 7 opcodes per iteration

My gut feeling is that the issue is due how we're laying down constraints lazily we're accumulating more and more quadratic terms inside s1 and s2 and we're not recognising that we have a common value of new_s0 being added into it which should be simplified. We're then doing a lot of recalculation because of this.

This can be seen as if we use the loop below (note the large number of iterations plus the range opcodes being used to enforce that the state is always represented as witnesses) I get a circuit size of 556. Commenting out these range opcodes causes the circuit to grow to 1480.

    for _r in 0..50 {
        s0 += C[((rf as u64 / 2)+1)*state.len() + _r as u64];
        let t = s0;
        let tt = t * t;
        let tttt = tt * tt;
        s0 *= tttt;
        s0 += s0 * S[(5)*_r ];
        s0 += s1 * S[(5)*_r +1];
        s0 += s2 * S[(5)*_r +2];
        s1 += s0 * S[5 * _r +2];
        s2 += s0 * S[5*_r  + 3];
        s1.assert_max_bit_size(253);
        s2.assert_max_bit_size(253);
    }

I also get a constant difference of 11 opcodes if I change the number of loop iterations in this case. Removing 2 for the range opcodes, this doesn't quite line up with the expected 7 but :shrug:

jfecher commented 7 months ago

@TomAFrench Going from 1 to 2 iterations in the original code I see a change of 6 opcodes, which lines up with Zac's original estimate.

I also just finished testing acir-gen/Ssa::into_acir and found that that does increase by a constant 3 additional opcodes each iteration when going from 1 iteration to 2 to 3. So acir-gen seems fine here.

Sounds like we're missing a way to factor out common instances of new_s0 then? Or maybe we just can't lay down constraints lazily anymore.

TomAFrench commented 7 months ago

Sounds like we're missing a way to factor out common instances of new_s0 then?

It's kinda yucky as it exposes knowledge about circuits to the end user, but yeah it does seem that a force_witness(expr) would be useful for some of these algorithms.

Or maybe we just can't lay down constraints lazily anymore.

I don't think so, we're always going to need some laziness as emitting an acir opcode per SSA instruction will mean that we'll never fill up a plonk gate. I'd need to think on if we can identify variables such as s1 and s2 automatically.

jfecher commented 7 months ago

I've confirmed the extra constraints are being pushed here: https://github.com/noir-lang/noir/blob/master/acvm-repo/acvm/src/compiler/transformers/mod.rs#L85-L96, 3 for the first iter, 4 for the second, then 5, etc.

TomAFrench commented 5 months ago

@vezenovm do you want to take a look at making ACIR gen aware of the expression width?

vezenovm commented 5 months ago

@vezenovm do you want to take a look at making ACIR gen aware of the expression width?

Yeah happy to take this on.

jfecher commented 3 months ago

Reopening this since the flag introduced by #5493 is not enabled by default

noir-lang / noir