status-im / swarms

Swarm Home. New, completed and in-progress features for Status
92 stars 31 forks source link

perfect-forward-secrecy #289

Closed yenda closed 6 years ago

yenda commented 6 years ago

We still need a go developer and UX designer for this swarm.

adambabik commented 6 years ago

Does it include moving the whole chat logic to Go? Or it's a hybrid approach where X3DH and double ratchet will happen in status-go but the chat protocol itself is still in Clojure?

yenda commented 6 years ago

@adambabik for the MVP @cammellos will go for the hybrid approach, meanwhile the go dev and I can start moving the chat protocol to go to anticipate for iteration 1.

cammellos commented 6 years ago

@adambabik ideally at the end of the swarm we have something like this:

sendOneToOneMessage(pk, payload) sendPublicMessage(chatId, payload) etc

So the inner working of the protocol will be in status-go (as discussed in pm).

What I reserve for a later iteration is to actually move the persistence layer to status-go, as not strictly necessary to achieve what we want, of course if we manage is a plus. All ok with this approach?

adambabik commented 6 years ago

Thanks, @yenda and @cammellos. Sounds good to me :)

yenda commented 6 years ago

@adambabik do you want to be our go-dev :) ?

adambabik commented 6 years ago

@yenda sure, count me in.

oskarth commented 6 years ago

A few questions that would be useful to answer, or have a plan for how to answer:

  1. What does the impact of using a server (X3DH) have on reliability? What happens if a user changes mail server, or it is down?
  2. What happens if a user is changing device? Recovering account?
  3. Will this be strictly limited to 1:1 chat? Might be worth making this explicit
  4. What guarantees do we get in terms of forward secrecy? What makes it PFS as opposed to weak FS? This is in the paper, but would be useful to make this type of guarantees explicit. Additionally, what are the requirements in terms of key (non-)-reuse, randomness etc?
  5. We have an implementation in Go for DR, what about X3DH? Any candidates for this?
  6. What guarantees can we make and not make in terms of deniability?
  7. What trust do we put in the mail server? What would a compromised mail server be able to do?
  8. What guarantees can we make and not make in terms of darkness?
  9. What implications does separating out the initial key exchange step to be out of band (QR code or w/e) have in terms of guarantees we can make? Do we have have a plan for supporting this mode as well, and how do imagine this being represented in UX?
  10. In terms of audit, would it make sense to implement this as library/prototype first, push a spec, audit it, and then use it only once we are actually fairly certain it is secure (for our own defined guarantees)? As opposed to changing the existing code gradually.
cammellos commented 6 years ago

Part of the swarm is to find a solution to the problem and the questions you asked, so not everything has been fleshed out, so take the answer with a pinch of salt :) .

I will outline the rough idea first which might clarify some of the answers:

The main problem is with x3dh and the fact that currently we don't have a good solution for permanent decentralized storage.

Ideally we'd want to use swarm for distributing bundles, but not really an option unless we run our own nodes or use ethereum testnet, which does not provide persistence https://swarm-guide.readthedocs.io/en/latest/introduction.html , mainnet is scheduled for 2019. IPFS is an option, has not been explored very much and we don't know the requirements in terms of bandwidth to run on mobile. more investigation is needed.

Given the uncertainty and the unknowns of the two options above, the strategy is to work around those issue, build a system that works well in a p2p settings without decentralized (or centralized) storage and eventually "close the gap" once we find a suitable decentralized solution.

Basically we will try to propagate the x3dh bundle in different ways, at the very minimum: 1) Through qr contact codes 2) Through the contact string (instead of copy and pasting your address, you copy and paste a base64 bundle) 3) Through messages sent to public/1-to-1 chats

In addition to that, there are some option that needs a bit more thinking and will not be implemented in the first iteration:

1) Use whisper as a DHT (send an anonymous message to the network asking for bundle of X) 2) Use ENS to propagate the bundle (say TXT in dns) 3) Swarm/IPFS

In this way, as long as the contact has been discovered or shared through the status app, the user will have a bundle and PFS can be guaranteed.

In case you don't have a bundle (say you copy and pasted the address in slack and the user has pasted their pk instead of contact code), the initial message will not have fs, but will contain a bundle and any reply will do.

This is the gap that we will focus on closing, but in the meantime in the first iteration I would (in iteration order)

1) allow the user to send the message (this provides already better security that we currently have, it will be encrypted using a DH, so it will have one-sided forward-secrecy ie. no forward secrecy if the private key of the recipient is compromised) 2) Find a UX solution, for example, warn the user and prompt him to send a contact request instead, still allowing him to send the message if they want to do so 3) Find other technical solution swarm / ipfs, delay sending of messages etc

Basically the initial implementation will have pfs if we were able to exchange the bundle, fallback on allowing the user to send a message using a DH exchange Ephemeral key / Identity key.

Next iteration will focus on adding method for propagating the bundle.

This approach will mitigate risks as we will be focusing on delivering without relying on centralized solution (running our nodes with swarm) or not ready technology (testnet swarm) in the first iteration, moving onto scenario with more unknowns in next iterations.

Again, there still much to be discussed and this is just a rough overview.

Some base assumptions:

Whisper messages are exchanged always encrypted with the user PK and exchanged through a shared topic (same as now). This mean that any encryption implemented in the swarm will not reduce the current guarantees, both in terms of encryption & darkness.

The payload of the message will contain a "cleartext" header (only encrypted using the user pk) and an encrypted payload (using the exchanged key).

To answer your questions:

What does the impact of using a server (X3DH) have on reliability? What happens if a user changes mail server, or it is down?

We are not relying on mailservers to exchange x3dh bundles and we are allowing users to communicate if no bundle is exchanged, so reliability will not be impacted. Mailserver up/down will not have an effect, will make message/bundle propagation more difficult (only online->online messages will be sent/received), but that impacts the whole system.

What happens if a user is changing device? Recovering account?

The problem we faced before with account/recovery is that if as shared key is established and the message is sent using a symmetric key through whisper, the user that has recovered the account will not be able to decrypt it. This is not the case anymore as the receiving user will be able to decrypt the header. If a user receives a message that their not able to decrypt because they are missing that keyId/conversation, we can prompt the user and say: "X wants to contact you but seems like their are using a key from a different device, do you want to add them to you contacts?"( or something on those lines :) ). The message will contain a bundle so x3dh can be performed if the users decides to reply/add them back to their contacts.

Will this be strictly limited to 1:1 chat? Might be worth making this explicit

Yes, public chat are public so no coordination is done among peers, fs can be extended to group chats but not in the scope of this swarm.

What guarantees do we get in terms of forward secrecy? What makes it PFS as opposed to weak FS?

PFS if a bundle is exchanged, otherwise a sucessful message exchange needs to happen to have pfs on both sides, at least in the initial implementation. Key re-use can be guarded against. In terms of weak forward secrecy I would need to think about it a bit more, but as per doc only impersonating future sessions is possible if one of the identity keys is compromised.

We have an implementation in Go for DR, what about X3DH? Any candidates for this?

No, I have looked around but nothing prominent, the algorithm it's easy to implement https://github.com/status-im/status-go/blob/features/x3dh/services/shhext/chat/x3dh.go, the complexity is into integrating it with the protocol and distributing the bundle, so I don't see an advantage in use an external library

What guarantees can we make and not make in terms of deniability?

If bundle is exchanged PD otherwise no for the initial message at the protocol level. Currently we sign all messages at the whisper layer, so there's no PD. We can change that but a separate problem.

What trust do we put in the mail server? What would a compromised mail server be able to do?

None, the same as now

What guarantees can we make and not make in terms of darkness?

Not relevant as that's handled at the whisper layer and not in scope

What implications does separating out the initial key exchange step to be out of band (QR code or w/e) have in terms of guarantees we can make? Do we have have a plan for supporting this mode as well, and how do imagine this being represented in UX?

Probably answered above

In terms of audit, would it make sense to implement this as library/prototype first, push a spec, audit it, and then use it only once we are actually fairly certain it is secure (for our own defined guarantees)? As opposed to changing the existing code gradually.

I would discourage the approach, most of the bugs/security flaws will be in the integration with existing code, so auditing in isolation is not really meaningful in my opinion, and I don't want to get into long lived branches/big bang integrations.

1. Introduction — Swarm 0.3 documentation
adambabik commented 6 years ago

Great description @cammellos!

The payload of the message will contain a "cleartext" header (only encrypted using the user pk) and an encrypted payload (using the exchanged key).

What will be included in the header?

I would discourage the approach, most of the bugs/security flaws will be in the integration with existing code, so auditing in isolation is not really meaningful in my opinion, and I don't want to get into long lived branches/big bang integrations.

I agree that it does not makes sense at this point. However, in the future, having the whole implementation as a separate library makes sense, right? We would only need some glue code to connect the library with geth and Whisper/PSS/whatever and expose some interface to web3.js.

cammellos commented 6 years ago

@adambabik

What will be included in the header?

The ephemeral public key used for the exchange, possibly the sequence number of the message , the id of the bundle, this kind of stuff, also mind that I say "cleartext" but the header is encrypted with the pk of the receiver (Whisper encryption).

I agree that it does not makes sense at this point. However, in the future, having the whole implementation as a separate library makes sense, right? We would only need some glue code to connect the library with geth and Whisper/PSS/whatever and expose some interface to web3.js.

Yes, as long as that when the audit happens is integrated ( whether as a library or embedded it does not matter to me), but ideally as discussed before the transport layer (and persistence) is completely separated from our protocol and that's the direction we are moving to, so basically we can swap out the transport layer for whatever we want (pss). As a matter of fact it would be good to keep pss integration as a target so we make sure we are carving the boundaries correctly. Having it a separate library is not a problem.

oskarth commented 6 years ago

Re this part only:

Part of the swarm is to find a solution to the problem and the questions you asked, so not everything has been fleshed out, so take the answer with a pinch of salt :) .

I really like what Adam here https://github.com/status-im/ideas/blob/master/ideas/092-disc-v5-research.md#product-description - maybe we can do something similar in this Swarm? I.e. make (a subset?) of those questions explicit in the idea itself. As in, these are things we want to answer, and then as we learn the answers to them (or in some cases, answers that already exist), they can be added to the idea document. That way knowledge is spread and can be referred to by anyone.

GitHub
status-im/ideas
ideas - Swarm Home. New, completed and in-progress features for Status
cammellos commented 6 years ago

@oskarth sure, sounds like a good idea

oskarth commented 6 years ago

Read your description, great stuff!

My main remaining question right now are:

  1. Where you see a formal spec coming out of this, i.e. something other people can implement, and at some point also get peer reviewed.

  2. At what point an audit would make sense. I.e. it is something useful in terms of guarantees, and it is unlikely to change for some time.

cammellos commented 6 years ago

@oskarth , just a rough idea but:

Where you see a formal spec coming out of this, i.e. something other people can implement, and at some point also get peer reviewed.

At what point an audit would make sense. I.e. it is something useful in terms of guarantees, and it is unlikely to change for some time.

Provided we are happy with the implementation, there are 2 points where we could have an audit: 1) If we can 100% guarantee that a bundle is exchanged (say we manage to get swarm working for example), that is a good time 2) If we don't we can still have a security audit, with different guarantees (PFS if you exchanged a bundle, no PFS until successful interaction, PFS after that)

If we audit at 2) and then we move to 1) the only new bit that would need to be audited is the actual fetching of the bundle from swarm/etc (which is not part of the protocol btw, it's just another way to retrieve a bundle), as we would be completely disable non-PFS first messages, so the protocol would not be changed other than removing stuff.

RFC 6101 - The Secure Sockets Layer (SSL) Protocol Version 3.0
Google Developers
Discovery Document  |  API Discovery Service  |  Google Developers
arnetheduck commented 6 years ago

Happy to see this getting started. Big kudos for the focus on documentation / specification - this lends itself well to a designing the protocol such that can be reimplemented in any language and by any actor - something that will be valuable for cooperating with other clients! I particularly like it because it can also then be analyzed independently of the code.

Essentially, this is a full reboot of our chat protocol - with this in mind:

  1. How about developing it side-by-side with current protocol, using a toggle? This avoids a big-bang, but also avoids spending resources on testing and bugfixing each little step along the way as well as lessening the pressure to make security compromises while it's being developed.
  2. Why not start with a fully PFS-enabled protocol and build from there? In the first iteration, this means not allowing async messages at all. Plugging security leaks afterwards is hard - the extra cruft of non-pfs-encrypted messages will muddle the implementation and the protocol making analysis harder.
  3. What's the rationale behind protobuf? I don't see that mentioned in the .md - was there a discussion on this somewhere? Would be good to add/document the motivation somewhere. That said, big +1 for a machine-readable spec of message content structure (curious fact: signal protocol is protobuf as well - they just hand-parse it in some implementations!).
  4. Is there room here to negotiate features? PFS is the thing right now, but at some point we'll have RCF as well (really-cool-feature) - would be nice to have a controlled upgrade path at that point. This is slightly out of scope for this swarm, but I'm curious as to what the current thinking is, since the protocol is being redesigned - for a more complete picture: https://multiformats.io/. In general, there's a lot of buzz about libp2p in the ethereum world these days - it seems that if we can reuse some of what they have developed, it could help integration in the future, also with other projects - at this stage though, I'd look at this more as a nice-to-have, and perhaps it's better to focus on getting the best protocol possible done here without distractions.

Re audit, my thinking is that auditing the current chat protocol is a waste and should be canceled (@adambabik). Once this protocol has been developed to a sufficient quality, we can audit that instead, as @cammellos suggests. It can even be split into two audits at that point, one that focuses on the spec/protocol/interactions, and the other that focuses on the code implementation (these are largely orthogonal - once the spec is audited, auditing the code simply means checking that it follows the spec, which is not an easy problem, but still much easier).

Multiformats
Self-describing values for Future-proofing
cammellos commented 6 years ago

@arnetheduck thanks for the input.

How about developing it side-by-side with current protocol.

This is the intention, we will not break compatibility with consecutive releases and we will use the already established pattern to gradually release it.

It's not a complete rewrite of the protocol as the new layer (call it encryption layer), it's agnostic of the actual payloads exchanged at a higher (chat) or lower(whisper) level, both of which will stay the same as now (i.e the higher level does not know about key-exchanges/encryption/ratcheting and the encryption layer does not know about whisper, although there are a few things we'd need to discuss still).

Why not start with a fully PFS-enabled protocol and build from there?

I don't have a strong opinion myself on this, but in any case you don't have to disable async messaging to ensure PFS. You can just follow a model such as facebook where you need to accept a "friendship"(contact request) before you can send messages. At this point you can use any key exchange, as long as it's 2 messages, x3dh would suit fine, OTR as well with headers. The key exchange can be asynchronous, but the initiating party will need to re-transmit as we are not going to ack to avoid trading off on darkness, and no communication will happen until the key exchange is successful, so we need to be careful as it can potentially have an impact on message reliability. Probably is something that will need a broader discussion as UX/Product etc will all be impacted by this, but worth moving forward.

What's the rationale behind protobuf?

There's not really a strong push for protobuf, we just had an informal conversation and it's open to discussion, but why is suited is that it gives us is good support on different languages, json conversion, decent performance and is machine readable. We can document the choice in ADRs. Looking at the link you sent (https://multiformats.io/ ), it is supported so we might as well use multiformats instead if we do decided to stick with protobuf.

Is there room here to negotiate features? Is something that probably we haven't thought much about at this stage, thanks for the suggestion, more input would be very valuable

Multiformats
Self-describing values for Future-proofing
arnetheduck commented 6 years ago

It's not a complete rewrite of the protocol as the new layer (call it encryption layer)

well, the current protocol also does key nego - just not very well. also, if protobuf (or something else) is introduced, seems like it makes a lot of sense to rewrite the messaging as well using the same schema language - again, to promote potential interop (even if encryption comes in a separate layer)

disable async messaging to ensure PFS.

oops, yeah, you're right of course, I meant the initial messaging before contact setup specifically - the one that happens before ephemeral key exchange. in the .md, it says we should allow the user to choose to violate pfs etc, which I think is a bad idea.

libp2p

in case you want to dig in: https://github.com/libp2p/libp2p/issues/33

arnetheduck commented 6 years ago

break compatibility with consecutive releases and we will use the already established pattern

can I read about this pattern somewhere?

cammellos commented 6 years ago

the current protocol also does key nego

currently there's no key negotiation, it has been disabled for beta, we only use asym encryption

if protobuf (or something else) is introduced, seems like it makes a lot of sense to rewrite the messaging as well

Yes I agree, and this is something we discussed before, to keep the changes to a minimum though an focus on delivering pfs we will leave that at a later stage as not strictly necessary, we can though easily push out some protobufs describing the current protocol at the very minimum, of course if there's time we can also move that layer to protobuf instead of transit.

can I read about this pattern somewhere?

I don't think there's anything written as anyone is free to use the pattern they like, the constraint is that we don't break compatibility between consecutive releases, at least during beta (0.21 needs to be able to talk to 0.22, 0.22 to 0.23), which gives users a month to upgrade, considering we release every forthnight.

The pattern we followed so far is:

x - 1 reads/publishes old message

x is able to read old and new message format, publishes in the old format (compatible with x - 1)

x + 1 is able to read old/new messages format, publishes in the new format (compatible with x, incompatible with < x )

x + 2 reads new messages/publishes new messages (compatible with x + 1, incompatible with < x + 1)

Any stage can be prolonged to give users more time to upgrade

whyrusleeping commented 6 years ago

Hey, If there are any ipfs or libp2p questions, feel free to let us know :)

oskarth commented 6 years ago

@cammellos Can we update this PR with things discussed in this PR and get it merged? It'd be really useful if we didn't lose all the good discussions/questions, plus change in participants etc, in a PR thread. This way people can easily see what's up at http://ideas.status.im/

pedropombeiro commented 6 years ago

I've started capturing the discussions/changes that happened in this PR in the document (see e4794784d59bf443f0c8110474b58a9581fd2eb2). Please let me know if there's anything that we want to add/modify, I'll try to finish it tomorrow and then get started with the documentation itself.