DISCLAIMER: this issue may contain inaccurate or false statements. Its sole purpose is to investigate feasibility of possible attack ideas for n-RLN. It is WIP.

Introduction

In the Shamir's Secret Sharing (SSS) polynomial of n-RLN, all (secret) coefficients are computed from the same secret a0 (and other public values) using the Poseidon hash function.

In Poseidon, each round consists of an non-linear SBOX followed by a linear mix which can be modeled in a simplified manner as

$$x \rightarrow k{m,i}\cdot(x+k{r,i})^p$$

with $p=3,5$ and $k_m, k_r$ derived from round constants (in practice few other caveats exist, but we keep its formulation simple for ease of exposition).

It follows that by iterating the round function, we obtain the sequence of transformations

$$x_0 \rightarrow x1 = k{m,1}\cdot(x0+k{r,1})^p \rightarrow x2 = k{m,2}\cdot(x1+k{r,2})^p \rightarrow \ldots \rightarrow xn = k{m,n}\cdot(x{n-1}+k{r,n})^p$$

where $x_n$ corresponds to our final hash.

These iterations can be modeled with a set of multivariate polynomials. By working over the multivariate polynomial ring defined over $\mathbb{F}_p$ with variables $x_0,\ldots,x_n$, we define the polynomials

$$I = \begin{cases} k_{m,1}\cdot(x0+k{r,1})^p - x1 \\ k{m,2}\cdot(x1+k{r,2})^p - x2 \\ \ldots \\ k{m,n}\cdot(x{n-1}+k{r,n})^p - x_n\end{cases}$$

It follows that the ideal generated by $I \cup \{x_0 - \bar{v}\}$ contains the polynomial $x_n - Hash(\bar{v})$. The intuition behind this is that by adding the polynomial $x_0 - \bar{v}$ to $I$, we're assigning the value $\bar{v}$ to $x_0$ which iteratively defines all values of the variables $x_i$ in $I$ up to $x_n$.

Attacking 2-RLN

In the following, we take as toy example 2-RLN, that is Shamir's secret shares come from a degree-2 polynomial

$$y = \bar{a}_0 + \bar{a}_1\cdot x + \bar{a}_2\cdot x^2$$

where $\bar{a}_0$ represents the identity secret and $\bar{a}_1 = Hash(\bar{a}_0)$ and $\bar{a}_2 = Hash(\bar{a}_1)$ (in practice the hash is computed together with other public values, but we omit this for now).

The relation between the values $\bar{a}_0, \bar{a}_1, \bar{a}_2$ can be then modeled with the two sets of polynomials

$$I1 = \begin{cases} k{m,1}\cdot(a0+k{r,1})^p - x{1,1} \\ k{m,2}\cdot(x{1,1}+k{r,2})^p - x{1,2} \\ \ldots \\ k{m,n}\cdot(x{1,n-1}+k{r,n})^p - a_1\end{cases}$$

$$I2 = \begin{cases} k{m,1}\cdot(a1+k{r,1})^p - x{2,1} \\ k{m,2}\cdot(x{2,1}+k{r,2})^p - x{2,2} \\ \ldots \\ k{m,n}\cdot(x{2,n-1}+k{r,n})^p - a_2\end{cases}$$

over the polynomial ring $\mathbb{F}_p[a_0,a_1,a2,x{1,1},\ldots,x{1,n},x{2,1},\ldots,x_{2,n}]$.

From what said before, we have that the ideal generated by $I$ and the polynomial $a_0 - \bar{a}_0$, will contain the polynomials $a_1 - \bar{a}_1$ and $a_2 - \bar{a}_2$, since assigning a value to $a_0$ will iteratively assign a value to $a_1$ and $a_2$ according to the relations defined by $I$.

Note that $a_0$ denotes a polynomial variable, while $\bar{a}_0$ is a value, that is an element of the underlying field $\mathbb{F}_p$.

Modeling Shares

In order to recover the secret value $\bar{a}_0$ (and $\bar{a}_1,\bar{a}_2$), in 2-RLN we need 3 shares, i.e. 3 random $(x,y)$ pairs coming from evaluating

$$y = \bar{a}_0 + \bar{a}_1\cdot x + \bar{a}_2\cdot x^2$$

We assume we only know two random shares $(\bar{x}_1,\bar{y}_1)$, $(\bar{x}_2,\bar{y}_2)$ and we model their values using the polynomial variables $a_0,a_1,a_2$ as

$$S=\begin{cases}a_0 + a_1\cdot \bar{x}_1 + a_2\cdot \bar{x}_1^2 - \bar{y}_1\\ a_0 + a_1\cdot \bar{x}_2 + a_2\cdot \bar{x}_2^2 - \bar{y}_2\end{cases}$$

If we evaluate the polynomials in $S$ is the values $\bar{a}_0,\bar{a}_1,\bar{a}_2$ that were originally used to compute the shares, all of them will be equal to 0.

In other words, $(\bar{a}_0,\bar{a}_1,\bar{a}_2)$ is a solution to $S$.

Is there enough information?

Can we recover the secret value $\bar{a}_0$ by just knowing 2 shares out of 3? The theory behind Shamir's secret sharing ensures that it would be impossible (that is, all values from $\mathbb{F}$ are likely possible for $\bar{a}_0$), but it assumes the polynomial coefficients to be generated randomly, hence they are independent from each other.

In n-RLN, instead, such coefficients all depend on $\bar{a}_0$ through a non-linear relation given by the employed hash function. In other words, the unique solution to the polynomial system $\mathcal{P} = I_1 \cup I_2 \cup S$ (i.e. assignments to all variables so that all polynomials evaluate to 0) is defined exclusively by assigning the correct value $\bar{a_0}$ to the variable $a_0$.

This means that the unique solution to the system $\mathcal{P}$, i.e. a polynomial system in $2n+3$ variables, depends on a single variable assignment, that is $a_0$.

Informally, this means that the total entropy of the solution is just $\log{p}$ bits rather than $3\cdot \log{p}$, as it would be if $\bar{a}_0, \bar{a}_1, \bar{a}_2$ were computed randomly.

While the polynomials in $I_1$ and $I_2$ constrain only the coefficients without of the SSS polynomial, without revealing any bits on their actual assignments, the share polynomials in $S$, instead, do.

Indeed, since 3 shares would suffice to recover 3 independent coefficients in text-book Shamir's Secret Sharing, this means that each share reveals $\approx \log p$ of information about the secret coefficients all together.

The intuition behind the attack is that since the solution to the system $\mathcal{P}$ has just $\log p$ bits of entropy, then 2 random shares would provide -on average- enough information (in principle $\approx 2\log p$ when coefficients are independent) in order to be able to recover $\bar{a}_0$.

The way we attempt to recover $\bar{a}_0$, is by searching a solution to $\mathcal{P}$.

Finding a solution

The standard method to compute a solution to a multivariate polynomial system is by computing Gröbner basis.

We skip the technical details of such method, but we only mention that the ideal generated by $\mathcal{P}$ has minimum-degree generators of the form

$$B = \begin{cases}a_0 - \bar{a}_0\\a_1 - \bar{a}1\\\ldots\\x{2,n}-\bar{x}_{2,n}\end{cases}$$

where $\bar{a}_0,\ldots,\bar{x}_{2,n}$ are the assignments for $a_0,\ldots,x_{2,n}$ so that all polynomials in $\mathcal{P}$ evaluate to 0.

Such set $B$ results to be a Groebner basis for $\mathcal{P}$, hence finding it will automatically provide a solution to the system, including recovering the secret value $\bar{a}_0$.

There exists few valid algorithms to compute Groebner basis (Buchberger, F4, F5, etc.), however their computational complexity is hard to estimate and there is no guarantee that even simple systems could be solved quickly.

Attacking 2-RLN: PoC implementation

With the aim to make the above ideas a bit more concrete and ease discussion, we provide a Sage PoC implementation of the Groebner basis approach over a reduced-round version of an hash function that mocks Poseidon hash.

The following script can be tested online on Sagecell. The prime $p$ employed is the order of the curve BN254.

p = 21888242871839275222246405745257275088548364400416034343698204186575808495617;
F = GF(p);
F.<a0,x1_1,x1_2,x1_3,a1,x2_1,x2_2,x2_3,a2> = PolynomialRing(F)

# Reduced round hash
# Computing a1
r1_1 = 2*(a0  +1)^5 - x1_1;
r1_2 = 3*(x1_1+2)^5 - x1_2;
r1_3 = 4*(x1_2+3)^5 - x1_3;
r1_4 = 5*(x1_3+4)^5 - a1;

# Computing a2, i.e. a2 = Hash(a1)
r2_1 = 2*(a1 + 1)^5 - x2_1;
r2_2 = 3*(x2_1+2)^5 - x2_2;
r2_3 = 4*(x2_2+3)^5 - x2_3;
r2_4 = 5*(x2_3+4)^5 - a2;

# Those are example secret values
# can be computed with
#ideal(r1_1, r1_2, r1_3, r1_4, r2_1, r2_2, r2_3, r2_4, a0-1234567890123456789012345678901234567890123456789012345678901234567890123456).groebner_basis()
a0_val = F(1234567890123456789012345678901234567890123456789012345678901234567890123456);
a1_val = F(-5688860465330553566251445615169286501027840278850879586337975469649514457926)
a2_val = F(-13127261654781763675836365051266965574386061451819211794160656918246739109430)

# Shares computation
x_val = ZZ.random_element(0,p)
y_val = a0_val + a1_val*x_val + a2_val*x_val^2;
share1 = a0 + a1*x_val + a2*x_val^2 - y_val;

x_val = ZZ.random_element(0,p)
y_val = a0_val + a1_val*x_val + a2_val*x_val^2;
share2 = a0 + a1*x_val + a2*x_val^2 - y_val;

# Recover secret values
B = ideal(r1_1, r1_2, r1_3, r1_4, r2_1, r2_2, r2_3, r2_4, share1, share2).groebner_basis()

for e in B:
    print(e)

The script will return a base $B$ for $\mathcal{P}$ consisting of degree-1 polynomials (i.e. assignments to each variable) that will solve the whole system $\mathcal{P} = \{r{1,1},r{1,2},r{1,3},r{1,4},r{2,1},r{2,2},r{2,3},r{2,4}, share_1, share_2\}$.

Food for thought (take carefully!)

On local installations of Sage, the warning "falling back to very slow implementation" is raised when computing the above Groebner basis. For such reason, the number of hashing rounds is kept low in the above PoC, but with better implementations it might be possible to target an higher number of rounds.
By quoting Buchberger's algorithm Wikipedia page:

Faugère's F4 and F5 algorithms are presently the most efficient algorithms for computing Gröbner bases, and allow to compute routinely Gröbner bases consisting of several hundreds of polynomials, having each several hundreds of terms and coefficients of several hundreds of digits.

In the case of Poseidon, we have ~64 rounds for a full hash: this corresponds to a total of $2\cdot 64 + 2$ polynomials in $2\cdot 64 + 3$ (dependent) variables of degree at most 5 and coefficients of roughly ~80 digits each.

So it might be indeed possible to attack n-RLN for any $n>1$ with just 2 shares (or slightly more, depending on $n$) by using an efficient implementation of the F4/F5 algorithm.
All the above seems to suggest that we should not lower or we should be extremely careful in lowering the number of rounds of Poseidon hash in order to reduce ZK circuit constraints and gas cost, as was originally suggested in https://github.com/waku-org/nwaku/issues/724.
In principle the attack should work also for 1-RLN, i.e. standard RLN. But in tests we were not able to fully recover $\bar{a}_0$, even in a hash reduced-round scenario. The reason probably resides in the fact that 1 share does not suffice (roughly speaking, it doesn't provide enough entropy) to feasibly find a solution to the system of equations. Nevertheless, there might be cases where the attack might work, considering also the relevance of carefully tuning the parameters of the Groebner basis finding algorithm (not done during tests).

vacp2p / research

Algebraic attack to n-RLN #155