Multi-way VoIP (SPEC-174)

matrix-org / matrix-spec

The Matrix protocol specification

Apache License 2.0

181 stars 94 forks source link

Multi-way VoIP (SPEC-174) #75

Open matrixbot opened 9 years ago

matrixbot commented 9 years ago

Need to specify a standard profile for multi-way VoIP/Video calls.

Options include:

Signalling for all calls is mixed into the same room (useful for multicast traffic)
Signalling between calls and conference server is in side-channel 1:1 rooms
We use EDUs of some kind for non-stateful conference signalling

Do we do SFU, MCU, full mesh, multicast, or some combination?

Do we hook into Jitsi or Kurento or Janus or FreeSWITCH or what?

(Imported from https://matrix.org/jira/browse/SPEC-174)

(Reported by @ara4n)

matrixbot commented 9 years ago

Jira watchers: @ara4n

matrixbot commented 9 years ago

Links exported from Jira:

relates to BOTS-132

JasonLocklin commented 7 years ago

Mixing is incompatible with both E2E and very-low (CPU) power home-servers (overloaded HSs doing mixing will introduce latency, whereas forwarding streams is not at all CPU intensive).

Opus is very good at being extremely low-bitrate while a participant is silent, and this can even be further improved with voice-activity detection or push-to-talk. As a result, mixing will improve outgoing bandwidth from the home-server very minimally, in the typical situation where only one or two people in a conference may actually be speaking at any given moment.

Leaving the clients expecting multi-stream VOIP also leaves open the possibility of fancy p2p mesh routing in the future.

CR0CKER commented 7 years ago

Considering neither Signal nor WhatsApp support VoIP conferences and Jitsi et al remain vulnerable to MITM attacks because keys cannot be verified, it would be incredibly helpful to get encrypted conference calls via Matrix off the ground. Now that E2E encryption is working, if we can use that already verified channel to take care of the key exchange for WebRTC, it would make MITM attacks a lot more difficult.

If we use mixing to reduce bandwidth though, of course, a trusted server is crucial. If server resources needed for decentralized mixing, can we delegate that to certain servers the same way we are delegating identity management to certain servers in the Matrix?

Can we make it optional whether someone wants mixing for lower bandwidth or no mixing for better security?

JasonLocklin commented 7 years ago

If you are using a trusted server @CR0CKER, the existing TLS encryption is sufficient to prevent MITM attacks. There is no value in e2e implementations that would require a trusted server.

In the end, mixing is an anachronism from a time where bandwidth was expensive and codecs were bad. It was the easiest way for the devs to get started quickly, but ultimately, it's not compatible with e2e encrypted systems. If VOIP bandwidth on servers becomes a problem, the solution isn't mixing (which won't make much of any difference with Opus), it's tweaking the codec settings, turning on voice-activity-detection, and routing audio p2p where possible using ICE. So, it's not just a matter of turning on e2e encryption for conferences, a lot of work will be needed to change the underlying conference system first, then, once 1:1 audio streams are figured out, then e2e isn't such a problem.

ara4n commented 3 years ago

MSC2359 (#2359) looks to solve this.