mumble-voip / mumble

Mumble is an open-source, low-latency, high quality voice chat software.
https://www.mumble.info
Other
6.44k stars 1.12k forks source link

Add end-to-end crypto? #1813

Open mkrautz opened 9 years ago

mkrautz commented 9 years ago

[Originally from https://github.com/mumble-voip/mumble-iphoneos/issues/87]

@ioerror says:

A mumble server is able to wiretap every mumble client. It would be nice if the content of audio (and chat text per channel) was not available to a passive collection process. It should be possible for every client to do at least a pairwise ZRTP, if not a group ZRTP (pairwise, each pair?) as I think Silent Circle does for group calls.

SilentCircle has this group calling feature but it isn't Free Software. I've heard (but not used) that Jitsi has this with ZRTP and it is Free Software.

This would be an amazing feature and it would mean that a server would essentially just be a relay for encrypted data with minimal metadata (eg: user name, ip address, channel joined).

@Tea23 says:

As a pretty heavy Mumble user I'd just like to express some support for this idea. End-to-end crypto in Mumble would bring it up to speed with OTR, adding vital private voice comms to the landscape which at the moment are pretty scarce and certainly none of them have the feature set of Mumble.

At the moment verifying identities w/ Mumble is pretty difficult since the certificate handling is pretty shoddy, and if there really is no end-to-end encryption then its utility as a safe utility for private communication is severely limited.

Mumble's probably in the best position of any VoIP stack to provide an effective privacy suite.

mkrautz commented 9 years ago

I think piggybacking on ZRTP could work nicely.

My main concerns would be bandwidth requirements for group conversations, where, if we pairwise ZRTP between all parties, it could quickly grow into something unreasonable. But, if it's how Silent Circle does it, and it's expected that E2E VoIP group chat sends voice streams to each individual in the conversation, then I suppose it's not too bad...

I'm curious how you expect this to work from a user's and UI perspective. Would E2E mode be something you go into explicitly (I'm assuming it has to be, because of SAS auth), and your current channel is then in E2E mode? Also, what happens when new people join?

ZRTP is obviously more suited for a phone-like structure, where you explicitly make a call to a person or group of people.

So maybe entering E2E mode is equivalent to selecting people currently connected to the server, and launching a separate "E2E" channel for those people. Then it'd be completely separate from the channel-based communication that Mumble currently uses, and more like a "call". For example, it could open the E2E-enabled conversation in a new tab, so that it is obvious to users that it's a separate "call".

But then again, it also feels weird to have this be something you opt into explicitly...

How to implement in a usable way into the current scheme Mumble uses seems to be one of the bigger hurdles to overcome. I'm curious if you guys have any ideas in mind already...

hacst commented 9 years ago

I really dislike having to introduce "calls" semantics into what is by nature a group-chat software just so we can fit ZRTPs underlying assumptions. Here's how I see it:

User-to-User:

Groups:

General:

In any case implementing such a system is a lot of work. Making sure it is safe and provides all the guarantees we think it does is even more work. Such features will also have impact on client complexity, backwards compatibility as well as what server capabilities we can provide in the future (e.g. what about positional audio).

Imho: There's already specialized software you can use if you want e2e encrypted communication whose sole design can focus on that sole purpose. Bolting it on to mumble - even though not quite a frankenstein - might really be a bit out of scope for use. However interesting it is to think about this stuff...

Zorlin commented 6 years ago

This is a very important feature for some of my team chats, +1

ezi0o commented 5 years ago

in 2018 i think this is a major concern and should be implemented

damajor commented 5 years ago

Over time it became necessary to get this feature. A trade-off may be using more bandwidth but nowadays this is a minor issue.

ranomier commented 5 years ago

Just to add something to the List to consider.

Signal messenger added a new protocol called RingRTC.

I think its not yet open source, but I'm sure it will be soon.

MayeulC commented 5 years ago

I have been toying with the idea of making mumble into a prototype Matrix client. Since rooms can contain arbitrary data (and are just a persistent, distributed graph database), they could be made to store the channel tree. Channels could be shared between multiple rooms, etc.

Ephemeral Matrix rooms are already created (that I know of) upon placing a call, so channels could just be hidden as well, and made to contain the necessary metadata. I think it could be worthwile to pursue that goal, although it could be more suitable to open an issue for the feature instead of discussing it here.

The idea would be to piggy-back on the E2E ecosystem that Matrix already creates (including verification, extensible profiles, support for multiple devices -- soon with cross-signing), while adding everything that's nice about voice on Mumble. I bet a proof-of-concept wouldn't be that hard, since the Matrix Client-server protocol is quite simple.

I have been thinking about this for a while, but haven't found enough time to start implementing this. If you want, I could elaborate a bit more.

Mikaela commented 5 years ago

I have been toying with the idea of making mumble into a prototype Matrix client.

I think it could easily be a step into wrong direction, assuming you mean Matrix.org. While 1:1 Riot calls are end-to-end-encrypted, conference calls depend on Jitsi Meet.

Matrix.org is currently also bad for privacy, while Mumble currently doesn't store chat logs, the reference homeserver implementation for Matrix, Synapse stores everything forever https://github.com/matrix-org/matrix-doc/issues/447 including deleted messages. Multiple other privacy issues are listed at https://github.com/privacytoolsIO/privacytools.io/issues/1049.

I don't know what you would do to Murmur, but I have understood Synapse to be very heavy especially if you have bigger rooms, while Murmur seems to run anywhere and there is µMurmur for even more limited systems.

MayeulC commented 5 years ago

assuming you mean Matrix.org

That's indeed the protocol documented on matrix.org I am talking about (though not matrix.org's Matrix server instance, to disambiguate).

I think it could easily be a step into wrong direction,

This is indeed something that has to be cleared out. And the privacy points you mentioned are indeed concerning. Synapse should get better at this itself, but this is a lesser concern if the participants are on the same homeserver, moreso if E2E is enabled.


I am not sure we really should be discussing the merits of Matrix here, as I wouldn't want to completely derail the thread. I am not advocating for completely replacing Murmur here, but to be able to connect to Matrix servers with Matrix identifiers, and optionally use that as a backend for what Murmur is currently used for. We do not have to use jitsi. There is no requirement of interoperability with the call functionality, I rather see it as being orthogonal. However, a complete p2p implementation would be a requirement if we want to avoid changing synapse (the server's TURN and STUN config can be reused), while not depending on Murmur.

Actually, I was more thinking of a proof-of concept where a matrix room would just contain a Mumble server address, with an authentication mechanism for connecting (publish the public key in the room, for instance).

As I see it, it could be something completely bolted on the Matrix protocol, with little modification (for starters, at least) to either Mumble or Murmur.


Benefits:

Drawbacks:


Maybe the proper way to build what I describe here is a "hard" (could always be merged back) fork, after all, if all of this doesn't fall in the scope for Mumble. But it could then also be argued the same about E2E ... I'm just trying to collect some feedback here, and sharing my thoughts :)

nyovaya commented 5 years ago

I just wanted to add, that it needs to be possible for clients to verify their conference partners fingerprints, if the server is giving out wrong public keys to the user.

JJRcop commented 5 years ago

This is stating the obvious but just in case newcomers forget, mumble is already encrypted, so it's not a problem at all if you completely trust your host or your host is one of the parties of conversation.

https://wiki.mumble.info/wiki/FAQ/English#Is_Mumble_encrypted.3F

Please don't misinterpret my comment as saying this idea is already in mumble, just stating for the record it already has encryption, but not E2E encryption, which absolutely has its place and should be considered.

On another note

I think the extra bandwidth could be offset by server owners getting a separate config option for the bandwidth of E2E calls, so they can set it lower than normal.

toby63 commented 4 years ago

I would like to add a concept for this (also considering the discussion about chat logs matrix-org/matrix-spec-proposals#2560). Some of these ideas were of course already mentioned (especially by @hacst (see comment 2 )).

This would be a per channel solution, because mumble is a channel- and server-based software.

We create a group-key (details below) that is used for encryption for all participants in the channel.

So the client encrypts voice & chat with the group-key and sends it to the server, the server then sends it to the other clients and they decrypt it.

For chat logs (in case of server-side chat logs), the server will add the encrypted chat messages into a file (specific for that group). Access to the chatlog could be managed with the already implemented client/user certificate (only members of the group get access).

group-key creation: One of the most important factors is that the server is not involved in the creation. I think the easiest solution is if one of the users (maybe the first user in the channel) creates the key (automatically).

Now the only remaining problem is the distribution of the key, because that would normally be send over the server, so a potencial man-in-the-middle-attack by the server is possible. We need two solutions for that:

  1. encrypt the group-key for transportation (so it is not readable by the server).
  2. verify the certificate of the sender (and receivers) (for man-in-the-middle-protection).

For 1 (encryption) we have the following solution: We use (maybe seperate/additional) user certificates (lets call them: friends-certificates) to encrypt the group-key for transmission. (Note: The reason for additional certificates is that we maybe want to seperate between user-certificates (that would only be known between server and user) and friends-certificates (which would be known between friends, but send via the server).)

For 2 (verification of sender/receivers) we have these solutions:

  1. friends- or contact-trust: The mumble client could create another certificate (lets call it: trust-friends-certificate) and that is then send by the user via a different communication channel than mumble (e.g. via email) to someone else. This person can then import the certificate. The idea would be to make it as easy as possible, so the certificate could simply be a long string of characters, that the user can copy into a special text field in mumble, to import it. The group-key (or the friends-certificate mentioned above) is then signed with this trust-friends-certificate and so the other user(s) can trust it. This is also a long-term solution, because friends will be able to reuse this trust-system, if they don't change their identity etc.

  2. implement other information-directory-services or key-servers etc.: The idea is of course to have a third-party that we can trust instead of the mumble server.

  3. show a (rather short) code in the client, that users could compare by hand. Rather insecure. The idea was e.g. that a user reads this to other users in the voice chat and that they compare it then.

Details:

yanmaani commented 3 years ago

I understand that doing this "nicely", with proper, seamless, key exchange would be a lot of work. However, I think there would be a lot of value in doing it with key exchange out of band:

1) I join a room with my mates, Bob and Alice 2) Out-of-band, e.g. in a Matrix room, we agree to use the encryption key dab427091518b7fa7ee9a18c408cd7ff068bb16b0bbfc39596fe4e4b0e7967f2 3) I press the button "enable encryption" and enter the key 4) My client now encrypts all outgoing voice packets with the key. To whoever doesn't have my key, I am just sending them garbage, so I am muted. 5) Everyone I'm trying to receive voice from who isn't encrypting with my key also gets discarded 6) If Alice and Bob entered the same key, we can talk 7) We have PFS because we're negotiating a new session key each time

This could be slightly improved at very little cost by having some UI like "Yanmaani would like to enable encryption; key ID = a22943da1aba66c40dc1c6dcc8a29b3d" (where that's a truncated/salted hash of the session key), and only actually beginning to transmit encrypted when all the other room members have turned on the encryption properly.

Lots of other small UX improvements you could make like that, without actually implementing any "big" protocol.

And then if this is implemented and works, someone could maybe start looking at key exchange more properly, or going the Unix way and having that done by a separate, external daemon. But for me, just having this simple though ugly system of pre-sharing a key would be extremely useful for my personal needs.

ghost commented 1 year ago

For this feature, the flow as I understand it would be:

  1. ClientA records audio
  2. ClientA encrypts audio
  3. ClientA sends audio
  4. Server receives audio
  5. Server routes audio to ClientB
  6. ClientB receives audio
  7. ClientB decrypts audio
  8. ClientB plays audio

Items 2 and 7 are going to create a delay and it will largely be dependent on the algorithm, bit level, and speed of the two PC's. In a room with multiple people you would most likely have to multicast to attempt to reduce that delay. The overhead would be noticeable to those in the gaming community that use Mumble for real time communication.

The similar scenario with chat will be negligible by comparison. Since there is no storage of the audio it's probably just better to encrypt in transit (which TCP already does).

Krzmbrzl commented 1 year ago

I think this is addressed by https://security.stackexchange.com/a/127331 In short: It's not necessary to encrypt the same message for every recipient separately to get E2E.

MayeulC commented 1 year ago

Moreover, using a stream chipher like ChaCha20 (as used by wireguard) after negociating a symmetric encryption key, the performance cost is very small: one just need to generate the cipher stream, which can be accelerated with SSE, and XOR the cipher stream with the data stream. Assuming cipher stream generation is faster than sampling rate, this would add 1-instruction (XOR) of latency per byte, more or less (one might decide to wait for enough data to fill a packet, perform a vectored XOR on that before sending the packet, which would effectively produce less added latency on a per-byte basis, but could produce more total latency). We're talking about nanoseconds in any case.

Ariakenom commented 1 year ago

Audio isn't limited to a single channel in Mumble. How does it interact with linked channels and shouts? Sounds like it doesn't in anyone's thoughts so far.

Sounds to me like @hacst's comment from 8 years ago is great and just as true today. Unfortunately including the conclusion.

Krzmbrzl commented 1 year ago

Linked channels and shouts are not an issue. At least if the goal is merely to prevent the server from sniffing into conversations. We only have to treat the encryption the same as if all clients were in a single room. Thus, all clients on a server generate some sort of shared secret in a way the server doesn't know. Then the clients can all encrypt and decrypt packets in a way that the server can't. Thus, it wouldn't matter which channel a given user is in.

Ariakenom commented 1 year ago

Oh, I didn't see anyone mentioned per server instead of per channel. That is interesting.

Krzmbrzl commented 1 year ago

Not sure anyone did actually mention that, but that immediately popped to my mind in order to solve the problems you mentioned :) Imo the main point of E2E would be to prevent the server from sniffing on conversations.

But then again: a knowledgeable server admin may change the server's code to send the audio stream to a locally running client that can the decrypt the conversation. I guess true security would indeed only be provided in a call-like scenario (which could always be an additional feature - e.g. "secure channels" for which links and shouts won't work). However, the basic server-wide E2E seems sufficient to prevent the vast majority from getting ideas of sniffing into conversations that are not meant for them.

MayeulC commented 1 year ago

I 100% agree with @Krzmbrzl on the above. I see no reason to scope key handling depending on the channel the user is in. When shouting to another channel, keys to the packets could be transferred at the same time as the data itself.

The only issue I can find is that "invisible" users wouldn't be able to spoof on conversations, but that's probably a feature.

Now, indeed, the server could add any number of fake users it sees fit. That's a bit more obvious when eavesdropping, and users could theoretically make use of the "friends" mechanism to only send data to trusted users, or choose to ignore users and never give them keys. Secure channels where every participant must be verified are an option as well.

MITM by the server is a risk, of course, but that can be mitigated, especially with the "friend" feature: connect to another random server and verify fingerprints there, or out-of-band.

toby63 commented 1 year ago

I don't think undermining E2E is a good or even acceptable idea.

Krzmbrzl commented 1 year ago

Nobody is undermining anything. E2E is by definition only a technique that prevents anyone in the middle from decrypting stuff... It doesn't require separate encryption for all recipients.

Ariakenom commented 1 year ago

a knowledgeable server admin may change the server's code to send the audio stream to a locally running client that can the decrypt the conversation

I don't see a solution where the server is able to eavesdrop as useful. But server wide encryption where everyone has to be friends or where you need a secret (that is not shared with the server) to enter would work. The users need to verify each other somehow.

akhilman commented 1 year ago

I don't see a solution where the server is able to eavesdrop as useful. But server wide encryption where everyone has to be friends or where you need a secret (that is not shared with the server) to enter would work. The users need to verify each other somehow.

I'm not a advocate of encryption, but I can see a potential option:

The client can enable the sending of the session key to everyone, or only to friends, with the appropriate notification. The client can enable/disable encryption altogether.

toby63 commented 1 year ago

Nobody is undermining anything. E2E is by definition only a technique that prevents anyone in the middle from decrypting stuff... It doesn't require separate encryption for all recipients.

That is not the point, I proposed using a "(sessioned) group key" (or whatever it is called in correct terms), myself. I think this is also a standard method, but I am not an expert. But the point is, that E2E should always be restricted to known and even accepted (either "admin" choice or democratic or even consensus choice (see also point 1 below)) participants (otherwise it doesn't make any sense to use it). Somekind of server-wide encryption would be open to all (new) users on the server.

Even if limited to a channel, it is problematic, as users would expect their communication to be limited to participants that are known (aka visible and maybe verified) and restricted (e.g. password-protected channel and maybe forward secrecy (see point 1 below)) and/or that take/took part in this session only. So in a very good implementation, it would be possible to:

  1. (have at least an option to) Hide past conversation from new members of a channel (new keys etc.) Note: This is not only necessary for stored chat content (#2560), because if someone eavesdropped (the encrypted content) somehow, then the keys have to stay secret, otherwise someone (who joined later) might be able to decrypt eavedropped content later.
  2. limit conversation to participants who are active in this moment (there are multiple methods for this, like sub-channels, "start session" button etc.) - especially useful on an open channel or a channel with many members etc.
  3. limiting participants has to be secured by E2E too, so it should not be possible, that e.g. a server admin can add a user to a channel and this user would get automatic E2E access.