mlswg / mls-architecture

MLS architecture
Other
66 stars 26 forks source link

Write sections on invalid commits and access control #261

Closed Bren2010 closed 2 months ago

Bren2010 commented 3 months ago

The reason for this PR was discussed on the mailing list: https://mailarchive.ietf.org/arch/msg/mls/iH61yiILUASWyqt-4cbOwZsUpUQ/

In almost every section, this document describes a.) problems that need to be solved to deploy MLS, and b.) the general space of solutions and trade-offs between them. The issue with this section is that is presents problems with no guidance on solutions

ekr commented 3 months ago

I concur with Richard. I don't disagree that this section doesn't provide clear guidance, but that's because the WG did not come down on specific solutions when it designed MLS. This PR does not change that situation, and I don't think we should land it.

Bren2010 commented 3 months ago

Is there different text we could add to this document that addresses my concerns?

bifurcation commented 3 months ago

Assuming by "addressing my concerns" you mean adding detailed guidance for how to handle invalid commits, I think the answer is "no". The details of how you solve this problem are so tied up with application details that I don't really think there's a way to map out the space of solutions in a succinct enough way to fit in this document. This document's job is to describe the overall design and deployment environment for MLS, not to provide solutions for every problem. If folks wanted to start up an "implementation considerations" document where we can go into some of these things in depth, that would be fine; in fact, it has been discussed at some past WG meetings. But we don't need to weigh the architecture document down with this.

Bren2010 commented 3 months ago

Repeating what I said on the list, I don't think it makes any sense at all to publish this document (which, per the intro, "provides guidance on how to deploy MLS along with discussions of privacy/security trade-offs") without discussing this. These problems are core to every MLS deployment, and many of the solutions implementors have chosen have subtle and/or unexpected effects on the security & privacy properties of their deployment

ekr commented 3 months ago

At this point, I think we just disagree, so I think we need the chairs to figure out how to resolve this.

@seanturner @grittygrease

rohanmahy commented 3 months ago

Repeating what I said on the list, I don't think it makes any sense at all to publish this document (which, per the intro, "provides guidance on how to deploy MLS along with discussions of privacy/security trade-offs") without discussing this. These problems are core to every MLS deployment, and many of the solutions implementors have chosen have subtle and/or unexpected effects on the security & privacy properties of their deployment

The architecture document publishes guidance on things we had consensus to provide guidance on. I proposed guidance in a number of areas that also ended up "on the cutting room floor".

Bren2010 commented 3 months ago

I don't know why your past proposals were rejected, but this text addresses substantial issues with the way MLS has been used in practice:

kkohbrok commented 3 months ago

Regarding the concern about PCS-negation: The consequences of "resync" operations to PCS are well known and documented at the end of Section 12.4.3.2 of the specification. If an application wants to mitigate the resulting risk it can follow the recommendations described in that section, or indeed regularly rotate the signature key. It might be worth referencing that text in the architecture document, though.

As far as I'm aware (and as explored in this paper by @cascremers), Applications that use Signal commonly have a similar problem, where they allow the re-establishment of sessions (to recover from loss of state). The result is the same as with a resync in MLS. The advantage of MLS is that it allows the rotation of signature keys to mitigate this problem.

Bren2010 commented 3 months ago

The text that you link to discusses signature key compromise; the "external rejoin" approach makes it impossible to recover from key schedule compromise. No signature key compromise is required

kkohbrok commented 3 months ago

Then I don't understand the compromise scenario you allude to when you say that rejoining negates PCS. I also re-read the text you propose in this PR and it doesn't make it any clearer. Could you elaborate?

Bren2010 commented 3 months ago

Note that while the GroupInfo is signed, it is not necessarily signed by the member that was compromised, so users might not have a way to revoke that key such that the GroupInfo signature would no longer verify.

kkohbrok commented 3 months ago

Why would Bob accept a GroupInfo from epoch i if he's already successfully processed a commit from epoch i+1? Is there a scenario where it would be necessary (from an operational standpoint) for Bob to accept epoch downgrades?

Bren2010 commented 3 months ago

I don't know of a reason that it would be necessary to accept epoch downgrades, but only accepting epoch upgrades doesn't really prevent this issue (and nobody has mentioned that they do this anyway). That's because every client is essentially a signing oracle for a GroupInfo with epoch n+1:

rohanmahy commented 3 months ago

I don't know of a reason that it would be necessary to accept epoch downgrades, but only accepting epoch upgrades doesn't really prevent this issue (and nobody has mentioned that they do this anyway).

Uh, if they are accepting epoch downgrades they are not in compliance with RFC 9420, AFAIK. Here is the first bullet item of Section 12.4.2 on Processing a Commit: "A member of the group applies a Commit message by taking the following steps:

You seem to be changing your story about this attack between yesterday and today. If Alice has "healed" when she sends a Commit with UpdatePath, the external_pub will have changed and all the ratchet_tree secrets above Alice. If Bob processes the "healing commit", the attacks you describe in your example attack yesterday are no longer possible in the new epoch.

Regarding Eve and Frank not getting Alice's commit in today's attack, even if the DS has a GroupInfo and external_pub, the clients Eve and Frank would need to validate the credentials of all the LeafNodes in the GroupInfo (which only the AS could forge, not the DS). There is a separate whole section in the Security Considerations section about the compromise of the AS.

Also, the application still needs a reason to accept the external commit. There may be app-level authorization. In MIMI, for example, Alice would have needed to add the DS to the roster (an active attack of Alice and a malicious DS), otherwise the clients of Eve and Frank would never authorize the presence of the DS in the GroupInfo.

Finally, there is also the epoch_authenticator or gossip if you want to make sure that different nodes have the same view of the group.

raphaelrobert commented 3 months ago

I don't think Brendan was suggesting the DS literally becomes a group member other than by compromising a client, but I agree with the last part in particular.

I wonder if the whole attack scenario can be collapsed to the following:

If that's the case, an attack where the DS isolates members and keeps them on a compromised epoch and subsequently a fork is a trivial one. That's not even specific to MLS, but specific to the fact that all communication is relayed over the DS. As Rohan points out, you need an out-of-band verification mechanism in that case.

I'd suggest writing down the exact attack, including all assumptions, in a document so that folks can give better feedback and iterations become possible.

Bren2010 commented 3 months ago

You are conflating two issues, @raphaelrobert. Yes, if an epoch is compromised while a member is offline, then when the member comes online, the DS can withhold PCS-achieving commits that would end the compromise. The only way to detect that is with out-of-band communication.

I only discussed the potential of "group is compromised while member is offline" to address Konrad's question of "why don't externally joining members just check that the epoch field of the GroupInfo is higher than before?" -- Frank and Eve were always going to be vulnerable to a malicious DS. The client that should NOT be vulnerable is Bob, because Bob already saw and processed the PCS-achieving commit.

However, Bob IS vulnerable, and did not actually achieve PCS by processing the commit (described why here). Even if Bob follows Konrad's suggestion of "check that the epoch counter in GroupInfo is higher than before", Bob still does not achieve PCS (described why here). Although, to be clear, a client performing an external join has no obligation under RFC 9420 to do Konrad's suggested check, so the existence of Frank and Eve is unecessary.

kkohbrok commented 3 months ago

Thanks for elaborating @Bren2010, I understand the attack now. Two observations:

Bren2010 commented 3 months ago

This indeed a bit of a weakness in the resync operation. Instead of relying on a strictly higher epoch, Bob would want to check that the transcript of the new group is an extension of the transcript of his last known good state.

To circle back to the top though, the question is whether we should include this text in the document. Given that these decisions impair security and that this consequence isn't immediately apparent to people, I think that's a very strong argument for including!

In your attack, the DS gets Bob to essentially roll back Alice's healing commit. But why wouldn't the DS just block Alice's commit to begin with as Raphael describes?

Like I said, in a pure RFC 9420 implementation the epoch checking doesn't happen, so Frank and Eve are unnecessary. The DS can rollback the group as far as it likes (bounded maybe only by credential expirations)

ekr commented 3 months ago

To circle back to the top though, the question is whether we should include this text in the document. Given that these decisions impair security and that this consequence isn't immediately apparent to people, I think that's a very strong argument for including!

It's an argument for including the analysis, but much of the proposed text is actually additional technical guidance. If you'd like to propose some text that only has additional analysis, I'd be glad to take a look

Bren2010 commented 3 months ago

This text has no guidance / recommendations / "should"s. It is only discussing potential solutions and their trade-offs. Not sure which text you're referring to.

ekr commented 3 months ago

Most of the section entitled "Invalid Commits" in which you discuss a bunch of high-level designs.

Bren2010 commented 3 months ago

I'm sorry, I don't understand what you're asking

rohanmahy commented 3 months ago

Hi @Bren2010 , After our phone call yesteday, I tried to summarize the attack (and related attacks), and any mitigations we discussed. Please let me know if I missed anything.

If a member of a group has a key schedule compromise, a malicious DS can prevent healing of the compromise by blocking Commits from the compromised member or preventing them from being fanned out, or by not forwarding a Remove proposal from the compromised member. (The DS maintains the group in an epoch containing a compromised member, possibly isolating the compromised member into a fork of the group).

This risk can be partially mitigated by having a policy requiring each member to periodically update its keying material or be removed, by revoking the compromised client's credentials, and/or by using a mechanism like the epoch_authenticator to make sure all members have the same view of the group. A partially compromised client might choose to reinitialize or create a new group instead of merely updating its key material.

In a variation of this attack, a malicious DS may present a GroupInfo to external joiners from a compromised epoch instead of the correct epoch. Clients are reminded to validate the LeafNodes of all members (including their credentials) of the ratchet tree in an external join. Members who are rejoining members with an external commit can also make sure that the epoch does not decrease (increases?) during a rejoin. (A malicious DS might even send an invalid commit to other members attempting to trigger a rejoin). Small very privacy sensitive groups may choose to to disable external joins.

Bren2010 commented 2 months ago

That's generally right, although the discussion above is about how the "check that the epoch counter increases on rejoin" solution isn't really effective. I'm not aware of any way we could modify the "DS chooses which commit is right" approach to prevent malicious members from bricking the group, while also not breaking MLS's security. We talked about a lot of solutions that would create various wrinkles for an attacker but nothing robust.

kkohbrok commented 2 months ago

I already proposed for Bob to instead check that the new transcript is an extension of his last known good one. @Bren2010 do you think that still doesn't solve the security downgrade caused by your attack?

Then the main question is still what we want in this document and I think here we agree that a description of the problem and an analysis of the risk is appropriate. What we don't seem to have agreement on is the inclusion of concrete, technical solutions. Or at least not which ones.

Bren2010 commented 2 months ago
seanturner commented 2 months ago

Out for a one-week consensus call; see email.