Faster remove? - Githubissues

TWal commented 1 year ago

When removing a participant, we blank its path up to the root because it might know the private key inside these nodes. We actually know precisely which nodes contain private key that are known to this participant: it is nodes for which the participant is not an umerged leaf.

From this remark, we could design an alternative Remove that produces less blank nodes:

blank all nodes on the path for which the participant is unmerged
remove the participant from the unmerged leaves list of other nodes

I think this would preserve the parent-hash invariants, but would need to check more thoroughly. (the intuition might rely on an invariant that is not checked on unmerged leaves, which is that the set of nodes for which a node is unmerged forms a connected path from the leaf going up [note that this invariant would have been true for free with #752!])

Pros:

less blank nodes are produced
removing a participant just after adding it does not change the tree, it's not that useful but it's a pretty property!

Cons:

the remove definition is more complex
the optimization isn't great if the participant is merged for most nodes (e.g., it has issued an UpdatePath in the past)

What do you think?

kkohbrok commented 1 year ago

It sounds nice and it really sounds like it's secure enough, but I'm not sure it's not a bit too late for this.

bifurcation commented 1 year ago

My personal analysis here:

If we do Remove like this, we should do Update as well. It's the same, you just don't remove from unmerged_leaves
On the one hand, this gives some better symmetry to the unmerged_leaves calculations and might make things slightly faster
On the other hand, this is pretty late for a substantive change and I don't expect the performance gain to be huge.

Overall, I'm more worried about the security analysis than the practicalities. If @TWal can get his model updated and checked in a timely manner, I would probably take this change.

(...more detailed notes below this line...)

Practically speaking, it's a small spec diff. (Note that we should cover Update here as well, which requires slightly different logic.)

@@ -3599,7 +3599,9 @@ A member of the group applies an Update message by taking the following steps:

 * Replace the sender's LeafNode with the one contained in the Update proposal

-* Blank the intermediate nodes along the path from the sender's leaf to the root
+* For each intermediate node along the path from the sender's leaf to the root:
+  If the member's leaf node is not listed in the `unmerged_leaves` field of the
+  intermediate node, then blank the intermediate node.

@@ -3621,7 +3623,10 @@ A member of the group applies a Remove message by taking the following steps:

 * Replace the leaf node L with a blank node

-* Blank the intermediate nodes along the path from L to the root
+* For each intermediate node along the path from the sender's leaf to the root:
+    * If the member's leaf node is listed in the `unmerged_leaves` field of the
+      intermediate node, then remove the entry from `unmerged_leaves`.
+    * Otherwise, blank the intermediate node.

I expect the implementation diff will be similarly small, something like:

-def blank_path(self, index):
+def blank_path_if_unmerged(self, index, remove):
   path = self.direct_path(index)
   for node in path:
-    node.set_to_blank()
+    if index in node.unmerged_leaves:
+      if remove:
+        node.unmerged_leaves.remove(index)
+      # else do nothing
+    else:
+      node.set_to_blank()

raphaelrobert commented 1 year ago

I think this would introduce a new attack.

The unmerged leaves entry effectively becomes a "do not delete" marker for parent nodes. An attacker can use this to stay in the tree longer than intended and thus violate the tree invariant. Similar to the attack Joel presented a while back, the attacker can prepare a special tree in the following way:

The adds its leaf index to the unmerged leaves list of all of its direct path parent nodes.
The attacker also updates its parent hash extension (by simulating a fake Commit) so that the new parent hash now covers the updated unmerged leaves lists.
The attacker presents this new ratchet tree to the victim.
The victim joins the group either by Welcome or by External Commit
When validating the ratchet tree, the victim does not detect the ruse because all parent hash extensions from all leaves in the tree check out.
The victim (or anyone who joined after the victim) proceeds to remove the attacker from the group.
While doing so, the victim now only blanks the attacker's leaf node and leaves the attacker's direct path intact because of the markers.
The attacker is now seemingly removed, but still knows secrets of the ratchet tree.

This can possibly be prevented by introducing additional validation for new joiners, but it seems very brittle nonetheless.

Given that in a group where updates are frequent, this wouldn't bring much of a performance benefit I would not be inclined to go forward with this. We would need much more analysis and it's a bit late for that.

TWal commented 1 year ago

Can you explain the attack more thoroughly with actual tree examples? I'm having trouble understanding it.

bifurcation commented 1 year ago

@raphaelrobert I don't think the attack works as you describe it, or if it does, we need stronger parent hash validation rules regardless of this PR. Any node that has ever sent a Commit should not be unmerged, so it should be impossible to both have a leaf sign a parent hash chain and appear as unmerged in any of its parents, as happens in your putative tree.

I agree that some analysis is needed. The only question is how quickly it can be done. What I'm wondering is whether some of @TWal's earlier work can be updated.

raphaelrobert commented 1 year ago

@bifurcation Yes, stronger validation rules are needed. Right now what you describe is not enforced, hence my qualification as an attack. I also agree that these rules could be added regardless.

TWal commented 1 year ago

I don't understand why we need stronger validation rules. When joining a group, invariants are checked on the tree, like the well-formedness of unmerged leaves or the validity of parent-hashes. A tree with these invariants is then secure: the signature of the leaf parent-hash-linked to a node covers the whole subtree at the time of signing, which forbids the old double-join attacks. You can't break TreeKEM's invariants without breaking the parent-hash invariants.

Therefore I think we just need to prove that the new Remove preserves the MLS invariants (unmerged leaves are well-formed + every node is parent-hash linked to a node below it). If that is true, I can update my formal proofs quickly I think.

bifurcation commented 1 year ago

@TWal As I understand what @raphaelrobert is proposing, the idea is that the attacker A prepares a tree of the following form, where:

[A] indicates A is unmerged
But A knows Q and R
A's leaf signs a parent hash chain (A -> Q -> R)
B need not be aware any of this is going on

In other words, it's like A added C to (A, B) in a full Commit, but with A also marked as unmerged at Q and R.

      R[A]
    __|__
   /     \
  Q[A]    _
 / \     / \
A   B   C   _

A sends this tree to C in a Welcome. If C subsequently tries to remove A, then according to this PR, you would end up with the following tree:

      R'
    __|__
   /     \
  Q       _
 / \     / \
_   B   C   _

Since the attacker A knows Q, and C encrypted the path secret for R' to Q, A still knows the group secrets.

raphaelrobert commented 1 year ago

Right now we have the following validation rules in the spec:

Verify the integrity of the ratchet tree.
Verify that the tree hash of the ratchet tree matches the tree_hash field in GroupInfo.
For each non-empty parent node, verify that it is "parent-hash valid", as described in Section 7.9.2.
For each non-empty leaf node, validate the LeafNode as described in Section 7.3.
For each non-empty parent node, verify that each entry in the node's unmerged_leaves represents a non-blank leaf node that is a descendant of the parent node.

Only the last rule concerns the unmerged leaves. That rule is not enough to prevent an attacker who is a regular member of the tree to add its leaf index to the unmerged leaves list of all the parent nodes in its direct path.

raphaelrobert commented 1 year ago

@bifurcation Thanks for the illustration, that's exactly what I had in mind.

TWal commented 1 year ago

For D to be parent-hash-linked to P (with child C), there is the following rule:

D is in the resolution of C, and the intersection of P's unmerged_leaves with the subtree under C is equal to the resolution of C with D removed.

With D = C = A and P = Q, the intersection of P's unmerged leaves with the subtree under C is [A], and the resolution of C with D removed is []. Therefore the tree will be rejected by the participant processing the Welcome.

TWal commented 1 year ago

Note that it is similar to the example 2 of #713, and that parent-hash was modified to prevent these kind of modifications.

raphaelrobert commented 1 year ago

@TWal It looks like the parent hash verification rule you mention does indeed prevent the attack. Given how unintuitive that rule is I a) missed it and b) I cannot say if it prevents the attack in all scenarios (although my intuition is it does). So it really comes down to proofs at this point.

TWal commented 1 year ago

The rule is a ninja equation designed by @bifurcation in #713. I agree that it is not easy to understand! It captures two facts:

D and P were updated at the same time, and their unmerged leaves is coherent with this fact: D.unmerged_leaves = Intersection(P.unmerged_leaves, Subtree(D))
the nodes between D and P were filtered, hence they are blank, and the co-path subtree are mostly empty (they might contain unmerged leaves of P)

From this we can prove the strong guarantees discussed in #713 and #731.

TWal commented 1 year ago

Note: if you look at the revised parent-hash link conditions in #752, it's much easier to understand why the attack doesn't work: it's the condition LastUpdateEpoch(P) = LastUpdateEpoch(D) which will be broken.

bifurcation commented 1 year ago

I thought I had an attack case, based on the following tree (* = parent hash link, B malicioiusly added the [B] entries), but the original sibling tree hash stopped it.

              T[B]
        ______|______
      */             \
      R[B]            _
    __|__           __|__
   /     \*        /     \
  Q       S       _       _
 / \*   */ \     / \     / \ 
A   B   C   D   E   F   _   _

Which points toward a simple, intuitive argument for why this proposal is OK:

Claim: Given an honestly constructed tree, a malicious group member can't modify it to add themselves to unmerged_leaves for a node whose private key they know

Proof:

If they know the private key because they set it, you can't be unmerged in it because the parent-hash chaining rules forbid it
If they know the private key because the received it from someone else, then their leaf is covered by the original sibling subtree hash. If they add themselves to unmerged_leaves, they'll be removed from the original sibling tree, and that hash will be invalid.

beurdouche commented 1 year ago

@TWal Is the F* spec in mls-proposal up-to-date enough so that I can double check how things currently work there? Or did you ninja this by hand :D ?

TWal commented 1 year ago

@beurdouche the parent-hash link invariant is defined here and the parent hash guarantees theorem is proved here, and Richard found the ninja equation here if that's what you're asking for!

@bifurcation I don't understand why you closed the issue as completed?

The proof goes as follows.

From the parent hash guarantees, we know that the signature covers all the subtree it the participant modified, at the time of their last modification (we name them the "original subtrees"). When a private key is generated for a node, the participant generating it:

send it to the leaves in the corresponding subtree
signs the corresponding subtree (via parent-hash)

Note that in every subtree signature (via parent-hash), there is the property that the private hpke key of the node is known only by the leaves of that subtree.

When you receive tree, you check every signature and compute every original subtree, and get back the property that the private hpke key of each node is known only to leaves in its original subtree.

Therefore, if you have an attack, it means you found a way to break the invariants checked when joining a group. It might be the case that the new Remove does break the invariants and we would need to strengthen them (as tried in #835), but my feeling is that it should work fine without additional invariants. I'll check in the next days and keep you up-to-date!

TWal commented 1 year ago

Thinking about an actual proof, I find that it does break the invariants:

Suppose we start with the following tree (which is valid, although we can't get it with normal operations)

      Y[C]   
    __|__    
  */     \   
  X       Z  
 / \*    / \*
A   B   C   D

Removing C makes:

      Y
    __|__    
  */     \   
  X       _  
 / \*    / \
A   B   _   D

It breaks the parent-hash link X -> Y, because the original sibling tree hash is not the same.

In practice, it's not possible to have the situation with C unmerged for Y but not for Z, but it's not an invariant checked when joining a group, so we can't rely on it… Note that this invariant is guaranteed for free with the unmerged leaves defined as in #752 :)

bifurcation commented 1 year ago

I'm sorry, I hit the wrong button when commenting!

bifurcation commented 1 year ago

It's concerning that we have trees that are valid according to the validation rules but can't be produced by normal operations. It seems like that calls for more validation.

I noticed as I was looking at this yesterday that I don't think we are validating two (equivalent) properties that are true for every real case of an unmerged leaf:

If X is unmerged at P, then it is unmerged at every node below P (P's descendants)
If X is not unmerged at P, then it is not unmerged at every node above P (P's ancestors)

More generally, I wonder if there's a connection here, that unmerged Remove is safe if and only if the tree validation rules prevent trees that could not have arisen naturally.

I would argue that we should fix the latter issue anyway. I filed #837 for this.

rohan-wire commented 1 year ago

Hi, I don't think this optimization is worth it. At least in the IM use case, I see lots of problems anytime there is a build-up of unmerged leaves. I see unmerged leaves as a transient situation that should not be encouraged nor optimized.

kkohbrok commented 1 year ago

The optimization does seem elegant, but at the very least this change should be brought to the list. In particular, I'd love to hear @MartaMularczyk's and @psyoptix' opinion on this from a provable security perspective before we consider merging this into the spec.

bifurcation commented 1 year ago

After discussion at the interim 2022-12-16, there was agreement that this change is not worth making.

mlswg / mls-protocol

Faster remove? #834