taikoxyz / zkevm-circuits

DEPRECATED in favor of https://github.com/taikoxyz/raiko! Taiko's fork of the PSE's ZK-EVM
Other
160 stars 125 forks source link

Deflate Circuit Spec #54

Open johntaiko opened 1 year ago

johntaiko commented 1 year ago

status: WIP

Motivation

reference issue: #42

Currently we use l1 contracts's calldata to store l2 block, but calldata has length limit and charges gas by byte. If we can reduce the size of calldata, we can save gas and store more transactions. According to the definition of l1 contracts, the transactions of the l2 block are only stored and do not participate in the calculation. And transactions account for a very high proportion of the block data, we known that: block = header + body(transactions). Therefore, we can compress the l2 block and store it in calldata.

image

(https://arxiv.org/pdf/2108.05513.pdf)

At the same time, it does not increase the computing overhead of l1 contracts.

Calldata gas cost

  1. zero value per byte: 4 gas
  2. other value per byte: 16 gas

Background

Based on the work of optimism(the best thanks to optimism's efforts), we can draw some conclusions:

First, zstd algorithm with dictionary has the best compression ratio.

zstd with dictionary Individual Transaction Bathes Transactions
Zero Byte 40% More less
Compression Rate 48.31% 37.31%
Estimated Fee Savings 30.99% 46.70%

Zstd

Zstd can be considered an optimization of the DEFLATE algorithm, which is a combination of Lz77 and Huffman:

deflate

(data flow)

Because the compression ratio based on static dictionaries is the best, and the huffman dictionary can be compressed into row and column representations, which is very suitable for zkevm lookup table, so we can think that DEFLATE is very suitable for use in zkevm. Before explaining how to implement the DEFLATE circuit in zkevm, we introduce the two algorithms.

How does LZ77 work?

Study by case: compress the data "ABABCBABABCAD"

Lz77-1

The same byte group is represented by (instance, length) pair, so the longer the same byte group, the better the compression ratio.

But we need to distinguish between ordinary bytes and (instance, length) pairs, so we need to add extension bits after ordinary byte to represent (instance, length) pair.

Lz77-2

So in Lz77, the smallest size of character is not 8 bits, it is a dynamic size depends on the following (instance, length) pair. Next we need huffman algorithm to record the size information for decoding.

How does Huffman work?

Huffman replaces the characters that appear more frequently with fewer bits.

Study by case: compress the slice "2,4,7,2,4,2,8,2,4"

Number Count Bit Count
2 4 1
4 3 2
7 1 3
8 1 3

Since our goal is to find the minimum number of bits, a bit can represent 0 or 1, we can turn the problem into: the shortest path of a binary tree

Huffman-1

(left arm is 0, right arm is 1)

So we can get the Huffman table:

Code Value
0 2
10 4
110 7
111 8

Finally, "2,4,7,2,4,2,8,2,4" will be compressed to "0101100100111010", from 9 bytes to 16 bits(2 bytes), the compression ratio is 22%. We find that the codes have different prefixes, so when we decode the data, we can lookup the table form shortest prefix to longest prefix one by one. Yeah, we get the character boundaries.

LZ77 and Huffman work together

The character in Lz77 will be presented from 0 to 32768. A Lz77 character equals 1 byte + (instance, length) pair. But 32768 is too large for huffman table, we need a secondary table to map 32768 to a smaller number.

Lz77+Huffman

(the largest code is 29)

So when we want encode the number 22, we find 22 is in code 8, and 8's code is 111. Let's see the code 8's table:

Number Bits
17 000
18 001
19 010
20 011
21 100
22 101
23 110
24 111

So 22 will be encoded to 111101.

DEFLATE circuit

What changes?

Trade-off

Because deflate decoding has dynamic "byte" length decided by huffman table, we need check the table bit by bit. It's hard to store the compressed data and implement the chip in halo2. So we just use a fixed size of code like 4 bits, so we can always use 4 bits to find the decoded value. Suggestion from Brecht:

A well known trick to decode huffman codes is to just use a fixed input table. For example, let's say there are code words 01 -> value_a and 0001 -> value_b. To decode these words you would create a lookup table like this:

code length value
0100 2 value_a
0101 2 value_a
0110 2 value_a
0111 2 value_a
0001 4 value_b

Then while decoding the data in the circuit, you always use 4 bit inputs to find the decoded value. However, when value_a is decoded you only jump ahead 2 bits in the input data stream (vs 4 bits when decoding value_b). So that's where the length field in the table also comes in so for each decoded value you also know how many bits to go forward.

It will lose the compression ratio, but it's easy to implement at the beginning.

Design

Circuit Layout

circuit-1

Data Flow

circuit-2

  1. Assign the public input into DeflateTable
  2. DeflateCircuit will decode the data from DeflateTable
  3. DeflateCircuit will assgin the TxTable from decode data
Brechtpd commented 1 year ago

Great overview!

It will lose the compression ratio, but it's easy to implement at the beginning.

The lookup trick does not change the compression ratio! I probably explained it insufficiently, the codewords are only all padded to the max bit length so that the lookup never fails. So if we have the codeword 01 and we want to make a lookup table for 4 input bits, we have to pad 01 with all possible bit values for the last 2 bits (because any data could be after the code word), so that means adding 00, 01, 10 and 11 to our codeword in the lookup input. Using the same example, let's say we have to decode 0101000101. We always read the max amount of bits needed by the code words, in this case that would be 4.

Step 1:

second_code: second table codes

Not sure I understand what the second table code means here?

  • Assign the public input into DeflateTable
  • DeflateCircuit will decode the data from DeflateTable
  • DeflateCircuit will assgin the TxTable from decode data

I think the decompression will either be part of the PublicInput circuit or maybe will be a circuit before it (so public input -> DecompressionCircuit -> PublicInput circuit). We can see, for know we can handle it as a separate circuit that just has as input the compressed data.