Inconsistency in range checks

zBlock-2 / summa-solvency-Turing

Apache License 2.0

0 stars 0 forks source link

Inconsistency in range checks #14

Open Y5Yash opened 7 months ago

Y5Yash commented 7 months ago

Background The circuit checks for all levels in the tree if the sibling node's balance (and two leaf balances) is less than m = 2 ** (NBYTES * 8). In the first look, this suggests that the max balance for the immediate parent of the leaf nodes at level 1 would be 2 * m and the parent at level 2 would be 3 * m, ..., and the max balance at the root will be (NLEVEL - 1) * m.

The inconsistency Let's consider two scenarios for a tree with just four leaves. Scenario 1: The user corresponding to Leaf 1 generates a proof. Scenario 2: The user corresponding to Leaf 3 generates a proof. The following diagram represents node balance range checks. A green node represents a direct check in the circuit, and a red node represents forced ranges.

Summa_inconsistent_scenario_1 Summa_inconsistent_scenario_2

Notice that depending on which leaf is generating the proof, the inner nodes have different ranges of values allowed. For all the cases to be consistent together, the root node can't be more than 2 * m (because you can find cases in which either of the root's children are checked to have max value m).

Expected behavior Root's max balance is (NLEVEL - 1) * m as can be inferred from circuits/contracts.

teddav commented 7 months ago

We talked about this issue with @Y5Yash and it could prevent users from verifying their proofs.
As explained by @Y5Yash: if the sum, at any Node, overflows the max 2^(N_BYTES *8) then the range check will fail. I wrote a test to verify this

#[test]
    fn test_range_dos() {
        let max_per_level =
            BigUint::from(2u32).pow(8 * N_BYTES as u32) - BigUint::one() - BigUint::one();
        println!(
            "max: {max_per_level:?} 0x{}",
            max_per_level.to_str_radix(16)
        );

        let entries = vec![
            Entry::<1>::new("user1".to_string(), [max_per_level]).unwrap(),
            Entry::<1>::new("user2".to_string(), [1u32.into()]).unwrap(),
            Entry::<1>::new("user3".to_string(), [5u32.into()]).unwrap(),
            Entry::<1>::new("user4".to_string(), [9u32.into()]).unwrap(),
        ];

        let mst = MerkleSumTree::<1, N_BYTES>::from_entries(
            entries.clone(),
            vec![Cryptocurrency {
                name: String::from("eth"),
                chain: String::from("Ethereum"),
            }],
            false,
        )
        .unwrap();

        let proof1 = mst.generate_proof(0).unwrap();
        let circuit1 = MstInclusionCircuit::<2, 1, N_BYTES>::init(proof1);
        let prover1 = MockProver::run(K, &circuit1, circuit1.instances()).unwrap();
        prover1.assert_satisfied();

        let proof2 = mst.generate_proof(2).unwrap();
        let circuit2 = MstInclusionCircuit::<2, 1, N_BYTES>::init(proof2);
        let prover2 = MockProver::run(K, &circuit2, circuit2.instances()).unwrap();
        assert!(prover2.verify().is_err());
    }

Node1's balance would be: max_per_level + 1 which would overflow the prime field by 1 and would result in an error when checking the Node's balance.
Looking at the code, we see that on level 0 the range check is done on the leaf which is fine. But for further levels, we check the Node's balance which is the sum of the previous nodes and could overflow.

In the circuit generation script example, there is a function to check for overflow: is_there_risk_of_overflow. But it only checks for overflow relative to the prime field.

0xalizk commented 4 months ago

But it only checks for overflow relative to the prime field.

Isn't that sufficient of a check?

enricobottazzi commented 3 months ago

But it only checks for overflow relative to the prime field.

I believe so, also check my comments on #10

Y5Yash commented 3 months ago

The focus of the inconsistency is on the fact that calculate_max_root_balance assumes the maximum value that root can take is m * (NLEVEL - 1) whereas the subtle constraints enforce that if root is more than m * 2, then someone's proof is bound to fail.

Steps involved when we are setting the parameters:

The maximum value that the root can have is determined by the modulus. (As max_root_balance < modulus)
The maximum value that a leaf can take is calculated as max_root_balance / (NLEVEL - 1). (As max_root_balance = max_leaf_balance * (NLEVEL - 1))
Using the max_leaf_value, NBYTES is calculated. (As max_leaf_value = 2 ** (NBYTES * 8))

Given the inconsistency between the intention while setting the parameters (NBYTES , max_leaf_value , etc.) and the actual constraints the circuit is enforcing, we can run into issues demonstrated by @teddav above.

But it only checks for overflow relative to the prime field.

Overflow is not an issue here, unexpected constraints leading to unexpected failure of proofs is a possibility.

enricobottazzi commented 3 months ago

I see your point.

A green node represents a direct check in the circuit, and a red node represents forced ranges.

Do you think that modifying the direct check inside the circuit such that the range is different from any level would fix this issue? For example in your first illustration, the Node 2 would be constrained in the 2*m range.

then someone's proof is bound to fail.

Overflow is not an issue here, unexpected constraints leading to unexpected failure of proofs is a possibility

Also, can you please better define what is the unintended consequence of this issue? I think that is not a soundness bug, namely a prover cannot generate a valid proof of inclusion of a leaf that is not in the tree.

Y5Yash commented 3 months ago

Also, can you please better define what is the unintended consequence of this issue?

I'll give an example. Let's say that the number of levels (NLEVEL) is 5. Let modulus = 2^10 + d, where d is a small number. With these initial values and the specifications of the protocol (as mentioned in the bullet points in my previous comments):

We set max_root_balance = 2^10.
We set max_sibling_balance and max_leaf_value = max_root_balance / (NLEVEL - 1) = 2^10 / 4 = 2^8
We set NBYTES = log(max_leaf_value)/8 = 1

Thus, a leaf can have values upto 2^8 = 256. Let's construct a tree in which all the leafs (i.e. users) have balances equal to this upper limit.

L5:                                                 2^12
L4:                     2^11                                                     2^11
L3:        2^10                        2^10                       2^10                       2^10
L2:   2^9          2^9            2^9          2^9        .......
L1: 2^8  2^8    2^8  2^8    ........

(Notice how the root balance exceeds max_root_balance)

Now, for the first user (L1[1]), proof generation/verification happens as follows (✅=pass, ❌=fail):

L1[1] <= 2^8 and L1[2] <= 2^8 (leaf node and sibling are checked) ✅
L2[2] <= 2^8 (as mentioned here) ❌
L3[2] <= 2^8 ❌
L4[2] <= 2^8 ❌
L5[1] <= 2^10 ❌

The reason for these failures is that the constraint max_sibling_balance = max_leaf_value = max_root_balance / (NLEVEL - 1) is incorrect.

Do you think that modifying the direct check inside the circuit such that the range is different from any level would fix this issue? For example in your first illustration, the Node 2 would be constrained in the 2*m range.

I believe (under the assumptions I mention below) the correct constraints should be:

max_leaf_balance = max_root_balance/(2^(NLEVEL - 1))
max_current_level_sibling_balance = 2^(current_level-1) * max_leaf_balance (this introduces a dependency on current level while checking sibling's range)

Under these rules, our parameters in the above example would be:

max_root_balance = L5_max = 2^10
max_leaf_balance = L1_max = 2^6
L2_max = 2^7
L3_max = 2^8
L4_max = 2^9

It can be seen that with these parameters, a valid tree will be generated.

I assume that we want to constrain each user to an upper limit such that each user can independently reach that upper limit simultaneously while the root doesn't exceed the modulus. An alternative to this would be not limit any user but just ensure that the root (i.e. the sum total of all users) doesn't exceed the modulus. Even if that's the case, there remain inconsistencies that I can expand upon (with an example similar to the first comment in the thread and new rules).

Y5Yash commented 3 months ago

Expanding upon the alternate assumption where only root is to be limited:

Let max_root_balance = 2^10 and NLEVEL=5. As calculated above, max_leaf_balance = max_sibling_balance = 2^8 Let the first user have the maximum balance allowed, second user have balance = 1, and the rest of the users have 0 balance. Then the tree would look like:

L5:                              2^8+1
L4:               2^8+1                             0
L3:      2^8+1             0               0               0
L2:  2^8+1      0       0       0       0       0       0       0
L1: 2^8   1    0  0    0  0    0  0    0  0    0  0    0  0    0  0

For the first user (L1[1]), the checks (as per the current protocol) will be as follows:

L1[1] <= 2^8 and L1[2] <= 2^8 (leaf node and sibling are checked) ✅
L2[2] <= 2^8 ✅
L3[2] <= 2^8 ✅
L4[2] <= 2^8 ✅
L5[1] <= 2^10 ✅

For the third user (L1[3]), the checks will be:

L1[3] <= 2^8 and L1[4] <= 2^8 (leaf node and sibling are checked) ✅
L2[1] <= 2^8 ❌
L3[2] <= 2^8 ✅
L4[2] <= 2^8 ✅
L5[1] <= 2^10 ✅

The proof will fail for all users >= 3. It can be shown that if second user with balance =1 was permuted to become the last user, all checks would have passed. Despite the sum total of all users not exceeding the max_root_balance, depending on the permutation of user balances, the proof fails. Other cases which fail can be similarly constructed.

Under these assumption, our rules should be updated to check at each level if the sum of balances of the current and sibling node exceeds max_root_balance rather than the present heuristic of just checking the sibling node.

enricobottazzi commented 3 months ago

I follow your point of view, but let me challenge the assumption that underlies the issue that you pointed out. You interpret N_BYTES as the max value that a balance of the user can be.

Instead, when we set N_BYTES to 8 (https://github.com/summa-dev/summa-solvency/blob/master/zk_prover/src/circuits/tests.rs#L22), we are setting it to avoid any overflow in the summation of the balances. What we are implicitly saying is that the order of magnitude of the summation (=root balance) should be "around" 8 bytes.

In particular, this is how you can calculate the maximum value that the root balance should be equal to. And that script is useful to check whether that value is less than the modulus of the circuit and therefore there's no risk of overflow.

L5:                                                 2^12
L4:                     2^11                                                     2^11
L3:        2^10                        2^10                       2^10                       2^10
L2:   2^9          2^9            2^9          2^9        .......
L1: 2^8  2^8    2^8  2^8    ........

In the example you provided here, the proof generation for the first user will fail and this is the expected behaviour, because this tree state would lead to a root balance that is greater than the limit we expect (see the script).

Your suggestion is valid and it is probably a more logic way of handling the range checks, but it's an alternative approach. I don't see any issue with the current implementation.

Y5Yash commented 3 months ago

Can you also critique my comments on the alternate assumption? If you feel that's not a problem either, we should be good to close this issue.

sifnoc commented 3 months ago

Your suggestion is valid and it probably offers a more logical way to handle the range checks, but it's an alternative approach. I don't see any issue with the current implementation.

I agree with this. However, I would like to add more.

In the early stages, we assumed that the custodian, who generates the commitment proof, would also generate inclusion proofs for all users. However, we've realized that this requires substantial resources, which means that the prover will likely start generating inclusion proofs only upon users' requests.

Considering this, I think this issue is a valid concern in the current implementation. During the commitment phase, the prover completes the root hash without necessarily detecting potential failures that could occur in the subsequent phase of generating inclusion proofs. This potential oversight can become apparent if the total balance exceeds 2^{N_BYTES * 8}, the range check limit, which is precisely the scenario Y5Yash described (i.e., 2^8 + 1). Y5Yash and Teddav have raised a good point here.

Thus, we can conclude that Summa relies on the provers' diligence when generating the commitment, which is the root hash of the MST, to avoid these failures if not generating all inclusion proofs.

To assist provers, how about adding a warning when the total balance exceeds 2^{N_BYTES * 8}? For example: "There is a possibility of failing to generate inclusion proofs due to exceeding the range check limit. Please check the entries of the Merkle sum tree."

What do you think of this approach to mitigate potential faults in the commitment phase?

enricobottazzi commented 3 months ago

@Y5Yash I understand your concern and I think that both this issue and this other issue denote that our implementation of the range checks creates some confusion. I propose here a way to fix both issues trying to limit the sacrifice in terms of performance of the circuit.

The simple rule is to let the exchange set N_BYTES as the range in which the sum of the balances should exist. The range checks will be performed as it is performed right now. In addition, also the root is checked to be inside N_BYTES.

The rule must be clear to the prover, which, when generating the commitment, should be warned (as suggested by @sifnoc) that if the sum of the balances is greater than N_BYTES, the generation of any proof of inclusion will fail.

Furthermore, similar to what happens right now, the user should still perform some checks on the verifier logic as there might be combinations of N_BYTES and LEVELS that make the root balance (or, potentially, even intermediary sums) overflow.

One example of that is the following one, by considering a prime p=13 and enforcing a range check of 3 bits on the balance of the nodes in the merkle proof and the root.

Y5Yash commented 3 months ago

I have a similar suggestion with a well defined rule:

"No subtree should exceed `max_root_balance/2`"

How to implement:

At each level check if the sibling node's balance is <= max_root_balance/2
Also check if the current node's balance is <= max_root_balance/2

This is similar to the current constraint but let's the nodes gain a value higher than max_root_balance/(NLEVEL - 1). It also addresses the issue you mentioned (I understand that not checking sum overflow is deliberate) and gives a well-defined rule with less potential for confusion. The downside, however, is that at each level you need to do two range checks (with a bigger number). More importantly, the permutation concern (as demonstrated in the 2^8+1 example) is not resolved either which might make me uncomfortable (it's worth investigating if the invalidity of certain permutations gives away information about users) but can still be reasoned by asking the custodian to be careful. For a quick demonstration:

m = max_root_balance
NLEVEL = 3

A valid permutation:
L3:         m
L2:   m/2       m/2
L1: m/2   0    0   m/2

An invalid permutation:
L3:           m
L2:     m          0
L1: m/2   m/2    0   0

I believe both implementations can be benchmarked and if the time differences aren't much, this can be considered.

enricobottazzi commented 3 months ago

@alxkzmn @sifnoc what's your take on that?

alxkzmn commented 3 months ago

My initial thought was adding a range check for the user leaf balance in the MST inclusion circuit, but now I think that an assertion on the prover side would be enough. The commitment is unbound because it's not generated in a ZK circuit, so we cannot prevent a malicious prover from overflowing the summation. Therefore, the problem will manifest at the inclusion verification stage. Having a failing proof from a CEX would be enough to expose a dishonest prover. On the other hand, having an assertion in the prover code would be enough to prevent an honest prover from making an accidental mistake.

zBlock-2 / summa-solvency-Turing

Inconsistency in range checks #14

"No subtree should exceed max_root_balance/2"

"No subtree should exceed `max_root_balance/2`"