Deflate Circuit Spec - Githubissues

status: WIP

Motivation

reference issue: #42

Currently we use l1 contracts's calldata to store l2 block, but calldata has length limit and charges gas by byte. If we can reduce the size of calldata, we can save gas and store more transactions. According to the definition of l1 contracts, the transactions of the l2 block are only stored and do not participate in the calculation. And transactions account for a very high proportion of the block data, we known that: block = header + body(transactions). Therefore, we can compress the l2 block and store it in calldata.

(https://arxiv.org/pdf/2108.05513.pdf)

At the same time, it does not increase the computing overhead of l1 contracts.

Calldata gas cost

zero value per byte: 4 gas
other value per byte: 16 gas

Background

Based on the work of optimism(the best thanks to optimism's efforts), we can draw some conclusions:

First, zstd algorithm with dictionary has the best compression ratio.

zstd with dictionary	Individual Transaction	Bathes Transactions
Zero Byte	40%	More less
Compression Rate	48.31%	37.31%
Estimated Fee Savings	30.99%	46.70%

Individual Transactions have about 40% zero value bytes in data, which is friendly for compression, but not friendly for gas savings, because zero value byte is cheap.
Actually, Contract Invoked Transactions are much more than Individual Transactions and they are likely to be the same contract.
- Popular contracts will be called more, like: uniswap
- Popular dependency library, like: openzeppelin
- So, both of them have the similar data, which is friendly for compression.

Zstd

Zstd can be considered an optimization of the DEFLATE algorithm, which is a combination of Lz77 and Huffman:

deflate

(data flow)

Because the compression ratio based on static dictionaries is the best, and the huffman dictionary can be compressed into row and column representations, which is very suitable for zkevm lookup table, so we can think that DEFLATE is very suitable for use in zkevm. Before explaining how to implement the DEFLATE circuit in zkevm, we introduce the two algorithms.

How does LZ77 work?

Study by case: compress the data "ABABCBABABCAD"

Lz77-1

The same byte group is represented by (instance, length) pair, so the longer the same byte group, the better the compression ratio.

But we need to distinguish between ordinary bytes and (instance, length) pairs, so we need to add extension bits after ordinary byte to represent (instance, length) pair.

Lz77-2

So in Lz77, the smallest size of character is not 8 bits, it is a dynamic size depends on the following (instance, length) pair. Next we need huffman algorithm to record the size information for decoding.

How does Huffman work?

Huffman replaces the characters that appear more frequently with fewer bits.

Study by case: compress the slice "2,4,7,2,4,2,8,2,4"

Number	Count	Bit Count
2	4	1
4	3	2
7	1	3
8	1	3

Since our goal is to find the minimum number of bits, a bit can represent 0 or 1, we can turn the problem into: the shortest path of a binary tree

Huffman-1

(left arm is 0, right arm is 1)

So we can get the Huffman table:

Code	Value
0	2
10	4
110	7
111	8

Finally, "2,4,7,2,4,2,8,2,4" will be compressed to "0101100100111010", from 9 bytes to 16 bits(2 bytes), the compression ratio is 22%. We find that the codes have different prefixes, so when we decode the data, we can lookup the table form shortest prefix to longest prefix one by one. Yeah, we get the character boundaries.

LZ77 and Huffman work together

The character in Lz77 will be presented from 0 to 32768. A Lz77 character equals 1 byte + (instance, length) pair. But 32768 is too large for huffman table, we need a secondary table to map 32768 to a smaller number.

Lz77+Huffman

(the largest code is 29)

So when we want encode the number 22, we find 22 is in code 8, and 8's code is 111. Let's see the code 8's table:

Number	Bits
17	000
18	001
19	010
20	011
21	100
22	101
23	110
24	111

So 22 will be encoded to 111101.

DEFLATE circuit

What changes?

Public Input: because we use the compressed data of transactions. The encoded data by deflate is instead of the transactions.
Tx Table: will be created after the deflate circuit decode the compressed data.

Trade-off

Because deflate decoding has dynamic "byte" length decided by huffman table, we need check the table bit by bit. It's hard to store the compressed data and implement the chip in halo2. So we just use a fixed size of code like 4 bits, so we can always use 4 bits to find the decoded value. Suggestion from Brecht:

A well known trick to decode huffman codes is to just use a fixed input table. For example, let's say there are code words 01 -> value_a and 0001 -> value_b. To decode these words you would create a lookup table like this:

code length value

0100 2 value_a

0101 2 value_a

0110 2 value_a

0111 2 value_a

0001 4 value_b

Then while decoding the data in the circuit, you always use 4 bit inputs to find the decoded value. However, when value_a is decoded you only jump ahead 2 bits in the input data stream (vs 4 bits when decoding value_b). So that's where the length field in the table also comes in so for each decoded value you also know how many bits to go forward.

code	length	value
0100	2	value_a
0101	2	value_a
0110	2	value_a
0111	2	value_a
0001	4	value_b

It will lose the compression ratio, but it's easy to implement at the beginning.

Design

Circuit Layout

circuit-1

code: first table codes
length: second code bits size
second_code: second table codes
value: actual value

Data Flow

circuit-2

Assign the public input into DeflateTable
DeflateCircuit will decode the data from DeflateTable
DeflateCircuit will assgin the TxTable from decode data

taikoxyz / zkevm-circuits

Deflate Circuit Spec #54