taikoxyz / zkevm-circuits

DEPRECATED in favor of https://github.com/taikoxyz/raiko! Taiko's fork of the PSE's ZK-EVM
Other
160 stars 125 forks source link

blake2f precompile #51

Open Brechtpd opened 1 year ago

Brechtpd commented 1 year ago

Checklist:

Description:

Implementation probably similar to sha256, so a good idea to first look at that implementation and use it as a base: https://github.com/privacy-scaling-explorations/zkevm-circuits/pull/756 Presentation: https://docs.google.com/presentation/d/1eHVzLPRIEziCJ_oDuJG0yZ-5klNj_daqioC0UaGDaiw/edit#slide=id.p

Brechtpd commented 1 year ago

Some thoughts on a possible approach.

Data layout

State

For the layout of the state relative offsets can be used to store the 4 state inputs used in G. There's 2 ways the data is stored, we just have to alternate between the two for each round. See below what relative offsets are necessary. There's also the choice to store the necessary data in a single row or a single column, though the row approach seems to make more sense to me for now.

#### The data stored in a row:

0, 4,  8, 12
1, 5,  9, 13
2, 6, 10, 14
3, 7, 11, 15

     ^
     |

0, 5, 10, 15
1, 6, 11, 12
2, 7,  8, 13
3, 4,  9, 14

---------------

-4, -3, -2, -1
-4, -3, -2, -5
-4, -3, -6, -5
-4, -8, -6, -5

===============

0, 5, 10, 15
1, 6, 11, 12
2, 7,  8, 13
3, 4,  9, 14

     ^
     |

0, 4,  8, 12
1, 5,  9, 13
2, 6, 10, 14
3, 7, 11, 15

-------------

-4, -1, -2, -3
-4, -5, -2, -3
-4, -5, -6, -3
-4, -5, -6, -7

#### The data stored in a column:

 0,  1,  2,  3
 4,  5,  6,  7
 8,  9, 10, 11
12, 13, 14, 15

     ^
     |

 0,  1,  2,  3
 5,  6,  7,  4
10, 11,  8,  9
15, 12, 13, 14

---------------

( 0,-4), ( 0,-4), ( 0,-4), ( 0,-4)
( 1,-4), ( 1,-4), ( 1,-4), (1,-4)
( 2,-4), ( 2,-4), (2,-4), (2,-4)
( 3,-4), ( 3,-4), (3,-4), (3,-4)

=============================

 0,  1,  2,  3
 5,  6,  7,  4
10, 11,  8,  9
15, 12, 13, 14

     ^
     |

 0,  1,  2,  3
 4,  5,  6,  7
 8,  9, 10, 11
12, 13, 14, 15

---------------

( 0,-4), ( 0,-4), ( 0,-4), ( 0,-4)
( 3,-4), ( 3,-4), ( 3,-4), ( 3,-4)
( 2,-4), ( 2,-4), ( 2,-4), ( 2,-4)
( 1,-4), ( 1,-4), ( 1,-4), ( 1,-4)

Which rotation to use is decided by a selector in a fixed column (because it's always the same for every round). Alternatively we can also use the lookup approach explained below for m to load in the correct state word on each row, but I think that may be bit less efficient (though could be more flexible, but that's not that important if the number of columns is reasonable).

Message

Two words are required per G and are stored on the row. Which word to use is decided by Sigma and differs per G and per round. The amount of rounds is not fixed.

In the first round we just load in the data, 2 words per row. In the other rounds we have to make sure that this permutated data is the same as the data in the first row. If the number of round was fixed this would be easy, we could just use copy constraints to hardcode the Sigma matrix into the circuit. But the amount of rounds is an input so this approach won't work. Instead we can make use of lookups. In the first round the message values are loaded into a lookup table as (hash_idx, m_idx, value) (note that it's easier to just generate this lookup table on the fly, so this data is not stored in a separate set of columns).. This will allow us to lookup the correct m value in all other rounds. We also need to lookup the correct index into m for the current round we're in. This we can do by using a fixed table column where we load in SIGMA for all possible rounds. Then to get the correct m_idx for a specific round for a specific row we first lookup (round_idx, row_idx, m_idx_a, m_idx_b) in this SIGMA table.

So we have to keep track of some identifiers while hashing:

G

v[a] := (v[a] + v[b] + x) mod 2**w
v[d] := (v[d] ^ v[a]) >>> R1

v[c] := (v[c] + v[d])     mod 2**w
v[b] := (v[b] ^ v[c]) >>> R2

v[a] := (v[a] + v[b] + y) mod 2**w
v[d] := (v[d] ^ v[a]) >>> R3

v[c] := (v[c] + v[d])     mod 2**w
v[b] := (v[b] ^ v[c]) >>> R4

With these calculations I actually think it makes more sense to use the lookup approach and not a bit approach. There's also this extra intermediate result that's used after the mod operations that makes it not ideal. There's also not a lot of reuse that can happen by using bits because the values cannot be reused a lot more when we use bits. The rotation is also mostly byte based + only a single rotation needed so the splitting into parts can easily be done efficiently. So I would just split up the 64-bit words in parts that make the most sense and minimizes the amount of lookups and columns needed to store those parts.

Layout

Main layout would look something like this I think:

| hash_idx | round_idx | row_idx | v_a | v_b | v_c | v_d | x | y | misc helper columns....

Gas considerations

A round only costs 1 gas, which is pretty insane. The circuit may need to be flexible enough to change the layout so that only a single round/row is used. I don't think this will need that many columns, but this is the absolute worst case scenario so shouldn't be the only layout supported.

Notes

v[a] := (v[a] + v[b] + x) mod 2**w

v[a]  64 bits:
chunks of 8 bits -> 8 parts + 1overflow part 

~ 20 chunks

(hash_idx, m_idx, value)

if (round == 0) {
meta.lookup_any(name, |meta| {
    [
         (0.expr(), hash_idx.expr()),
         (0.expr(), m_idx.expr()),
         (0.expr(), value.expr()),
    ]
 });
} else {
meta.lookup_any(name, |meta| {
    [
         (hash_idx.expr(), 0.expr()),
         (m_idx.expr(), 0.expr()),
         (value.expr(), 0.expr()),
    ]
 });
}

meta.lookup_any(name, |meta| {
    [
         (hash_idx.expr(), (round_idx == 0) * hash_idx.expr()),
         (m_idx.expr(), (round_idx == 0) * m_idx.expr()),
         (value.expr(), (round_idx == 0) * value.expr()),
    ]
 });

meta.lookup_any(name, |meta| {
    [
         (hash_idx.expr(), q_enable() * hash_idx.expr()),
         (0,                    q_enable() * round_idx.expr()),
         (m_idx.expr(), q_enable() * m_idx.expr()),
         (value.expr(), q_enable() * value.expr()),
    ]
 });