Open hero78119 opened 3 weeks ago
Here I want to describe how to prove a set { addr_i }
does not have duplicated elements (aka. every element has multiplicity one) by applying "Protocol 1" proposed in sec 4 of OLB24 in the multivariate setting (aka. using sumcheck). This can be used in memory init / finalize circuit.
OLB24
The prover encodes the set { addr_i: 0 < i < N }
as a univariate polynomial $z(X) = \prod (X - a_i)$. It's easy to see that this is equivalent to the formal derivative $z'(X)$ is co-prime to $z(X)$ in the polynomial ring $F[X]$. By bezout's identity, we know there must exists two polynomials $s(X), t(X)$ of degree less than $N-1$ and $N$ respectively s.t.
$$z(X) s(X) + z'(X) t(X) = 1. $$
According to Schwartz-Zippel lemma, this is equivalent to checking $z(\alpha) s(\alpha) + z'(\alpha) t(\alpha) = 1$ for a random challenge $\alpha \in E$ with negligible soundness error.
It's easy to see
{ a_i }
as a multilinear polynomial $Z(\vec{X}) = \sum_{\vec{i}} \textrm{eq}(\vec{X}, \vec{i}) \cdot a_i$ where $\vec{i}$ is the binary decomposition of $i$. The problem is reduced to
how to evaluate $s(\alpha)$ and $t(\alpha)$.
Note that $s(\alpha) = s_0 + s_1 \alpha + s_2 \alpha^2 + \cdots = s{odd}(\alpha^2) * \alpha + s{even}(\alpha^2)$ where
We also encodes { s_i }
as a multilinear polynomial $S(\vec{X}) = \sum_{\vec{i}} \textrm{eq}(\vec{X}, \vec{i}) \cdot s_i$ . Assuming $N = 2^k$ is a power of two, let's show how to compute $s(\alpha)$ using layered circuit which can then be proved by our tower prover. We call this poly evaluate tower prover
.
Notation: $B_j$ is the boolean hypercube with $j$ variables.
Let's take $k = 3$ as an example.
v3[0,0,0] = s0, v3[0,0,1] = s4, ...
;v2[0,0] = v3[0,0,0] + v3[0,0,1]*alpha^4 = s0 + s4*alpha^4
;v2[1,0] = v3[1,0,0] + v3[1,0,1]*alpha^4 = s1 + s5*alpha^4
;v2[0,1] = v3[0,1,0] + v3[0,1,1]*alpha^4 = s2 + s6*alpha^4
;v2[1,1] = v3[1,1,0] + v3[1,1,1]*alpha^4 = s3 + s7*alpha^4
;v1[0] = v2[0,0] + v2[0,1]*alpha^2 = s0 + s4*alpha^4 + s2*alpha^2 + s6*alpha^6
;v1[1] = v2[1,0] + v2[1,1]*alpha^2 = s1 + s5*alpha^4 + s3*alpha^2 + s7*alpha^6
;s(alpha) = v1[0] + v1[1]*alpha = s0 + s4*alpha^4 + s2*alpha^2 + s6*alpha^6 + s1*alpha + s5*alpha^5 + s3*alpha^3 + s7*alpha^7
.We can use tower prover to the reduce the correctness of $s(\alpha)$ to evaluation of $S(\vec{X})$ at a random point $\vec{r} \in E^k$. We can do the same for $t(\alpha)$.
Therefore protocol 1 in the multivariate setting works as follows:
@kunxian-xia this idea works! but it will introduce an overhead in prover to to interpolate and derive coefficients format for s(x) and t(x).
But I just reviewed again for our original question and suddenly a idea flash into my mind: the way to make us abandon to prove address vector are well format [0, 1, 2, 3, 4, 5, .. 2^b - 1] is because we haven't figure out how to convert into sumcheck PIOP, but... what if we just pre-commit few address vectors 16: [0, 1, ... 2^16-1] 17: [0, 1, ... 2^17-1] ... 28: [0, 1, ... 2^30-1] And we allow prover to auxiliary the index and choose which pre-commit poly to use?
Few benefits for these design
Wdyt :) ?
I want to explain why the memory entries that accessed by a guest program have holes (equivalently, the addresses are sparse).
The memory allocator used in RISC0 / SP1 looks like this
/// https://github.com/risc0/risc0/blob/main/risc0/zkvm/platform/src/heap/bump.rs
use crate::syscall::sys_alloc_aligned;
use core::alloc::{GlobalAlloc, Layout};
#[global_allocator]
pub static HEAP: BumpPointerAlloc = BumpPointerAlloc;
pub struct BumpPointerAlloc;
unsafe impl GlobalAlloc for BumpPointerAlloc {
unsafe fn alloc(&self, layout: Layout) -> *mut u8 {
sys_alloc_aligned(layout.size(), layout.align())
}
unsafe fn dealloc(&self, _: *mut u8, _: Layout) {
// this allocator never deallocates memory
}
unsafe fn alloc_zeroed(&self, layout: Layout) -> *mut u8 {
// NOTE: This is safe to avoid zeroing allocated bytes, as the bump allocator does not
// re-use memory and the zkVM memory is zero-initialized.
self.alloc(layout)
}
}
As you can see the above allocator is too simple and it never deallocates memory. This can cause big problem that never happen with modern malloc libraries like jemalloc
.
Let's assume your guest program has code snippets like this:
fn foo() {
{
let mut v = vec![];
for i in (0..n) {
v.push(i);
}
}
// ... later
{
let mut v = vec![];
for i in (0..n) {
v.push(i);
}
}
}
There are two issues in the above code snippets.
v
may have same memory address; while with the BumpPointerAlloc
allocator, 2nd occurrence of v
must have larger memory address than the 1st even though the scope of the two for-loops are disjoint.Vec
and HashMap
, if we don't reserve the "right" capacity before we do the insertions, the utilized ratio may be around 50% because each time the old space is not enough it will grows by twice (ref Vector
in rust std library). Then there will holes in the memory addresses accessed by your guest program.
Like i mentioned in earlier (comment):
max_mem_address / mem_address_used
is around 1.5 in the case of reth
.max_mem_address / mem_address_used
is around 32 in the case of fibonacci
with input n=2^20
.Here is a distribution of gap between address in the memory final read set in the case of reth
.
Elaborate succintly address form which verifier can evaluate it without any SNARK, and leveraging a multi-variate math fact (credited from @spherel 🔥 )
With above fact, verifier can evaluate addr
directly giving challenge vector $\vec{r}$ with just simple field arithmetics, without prover help. The non-uniform prover design from the fact that prover can witness n = ceil(log(max_memory))
per proof, and verifier take it to evaluate address.
This method also expose a powerful fact => we can add constant offset
and scalar
into the memory, so it end up that we can construct any consecutive memory sub-region starting on arbitrary offset.
Giving 2 assumptions for riscv zkVM access pattern:
One imperfect is prover still need to pay the cost [max_address, next_pow2]. I would claim this gap might be negligible. Since opcode proof might be bottleneck, and table proof with such non-uniform
design already solve the issue nicely. It is like 2/8 principle: 20% effort solve 80% problem. We can left those 80% effort as future work
PR WIP #457
Per discussion here https://github.com/scroll-tech/ceno/pull/251#discussion_r1797717452
Currently in memory consistency check it's relied on
max_memory
as circuit constant parameter, which obviously bring up a possiblegap
between real in-used memory since max_memory need to cover theworst case
memory consumption.One solution, as non-uniform circuit and sumcheck based design, it's naturally possible treating memory table as special case of dynamic table, such that "memory size" can be auxiliary input from prover so the size can be determined for different proof and prover just need to "pay-as-you-go". Extra cost of this design is we need to prove
address
column are well-formed: uniquely and sorted