WebRTC Architecture - Githubissues

zhu-ting commented 4 years ago

WebRTC extends the client-server semantics by introducing a peer-to-peer communication paradigm between browsers. The most general WebRTC architectural model draws its inspiration from the so-called SIP (Session Initiation Protocol) Trapezoid (梯形).

In the WebRTC Trapezoid model, both browsers are running a web application, which is downloaded from a different web server. Signalling messages are used to set up and terminate communications. They are transported by the HTTP or WebSocket protocol via web servers that can modify, translate or manage them as needed.

It is worth noting that the signalling between browser and server is not standardised in WebRTC, as it is considered to be part of the application. As to the data path, a PeerConnection allows media to flow directly between browsers without any intervening servers. The two web servers can communicate using a standard signalling protocol such as SIP or Jingle (XEP-0166). Otherwise, they can use a proprietary signalling protocol.

The most common WebRTC scenario is likely to be the one where both browsers are running the same web application, downloaded from the same web page. In this case the Trapezoid becomes a Triangle.

zhu-ting commented 4 years ago

WebRTC in the Browser

A WebRTC web application interacts with web browsers through the standardized WebRTC API, allowing it to properly exploit and control the real time browser function.

The WEbRTC web application also interacts with the browser, using both WebRTC and other standardized APIs, both proactively (e.g to query browser capabilities) and reactively (e.g to receive browser-generated notifications).

The WEbRTC API must therefore provide a wide set of functions, like connection management (in a peer-to-peer fashion), encoding/ decoding capabilities negotiation, selection and control, media control, firewall and NAT element traversal, etc

The design of the WebRTC API does represent a challenging issue. It envisages that a continuous real-time flow of data is streamed across the network in order to allow direct communication between 2 browsers, with no further intermediaries along the path. This clearly represents a revolutionary approach to web-based communication.

Signalling

The general idea behind the design of WebRTC has been to fully specify how to control the media plane, while leaving the signalling plane as much as possible to the application layer. The rationale is that different applications may prefer to use different standardised signalling protocols (e.g SIP or the eXtensible Message and Presence Protocol [XMPP]) or even something custom.

Session description represents the most important information that needs to be exchanged. It specifies the transport (and Interactive Connectivity Establishment ICE) information, as well as the media type, format, and all associated media configuration parameters needed to establish the media path.

Since the original idea to exchange session description information in the form of Session Description Protocol (SDP) "blob" presented several shortcomings, some of which turned out to be really hard to address, the IETF is now standardising the JavaScript Session Establishment Protocol (JSEP). JSEP provides the interface needed by an application to deal with the negotiated local and remote session descriptions (with the negotiation carried out through whatever signalling mechanism interacting with the ICE state machine.

The JSEP approach delegates entirely to the application the responsibility for dividing the signalling state machine: the application must call the right APIs at the right times, and convert the session description and related ICE information into the defined messages of its chosen signalling protocol, instead of simply forwarding to the remote side the messages emitted from the browser.

WebRTC API

The API is being designed around three main concepts: MediaStream, PeerConnection and DataChannel.

MediaStream

A MediaStream is an abstract representation of an actual stream of data of audio and/or video. It serves as a handle for managing actions on the media stream, such as displaying the stream's content, recording it, or sending it to a remote peer. A MediaStream may be extended to represent a stream that either comes from ( remote stream) or is sent to ( local stream) a remote node.

A LocalMediaStream represents a media stream from a local media-capture device (e.g webcam, microphone, etc). To create and use a local stream, the web application must request access from the user through the getUserMedia() function. The application specifies the type of media -- audio or video -- to which it requires access. The devices selector in the browser interface serves as the mechanism for granting or denying access. Once the application is done, it may revoke its own access by calling the stop() function on the LocalMediaStream.

Media-plane signalling is carried out of band between the peers; the Secure Real-time Transport Protocol (SRTP) is used to carry the media data together with the RTP Control Protocol (RTCP) information used to monitor transmission statistics associated with data streams. DTLS is used for SRTP key and association management.

The following figure shows, in a multimedia communication each medium is typically carried in a separate RTP session with its own RTCP packets. However, to overcome the issue of opening a new NAT hole for each stream used, the IETF is currently working on the possibility of reducing the number of transport layer ports consumed by RTP-based real-time applications. The idea is to combine multimedia traffic in a single RTP session.

PeerConnection

A PeerConnection allows two user to communicate directly, browser to browser. It then represents an association with a remote peer, which is usually another instance of the same JavaScript application running at the remote end. Communications are coordinated via signalling channel provided by scripting code in the page via the web server, e.g using XMLHttpRequest or WebSocket. Once a peer connection is established, media streams ( locally associated with ad hoc defined MediaStream objects) can be sent directly to the remote browser.

[ ] STUN: The Session Traversal Utilities for NAT protocol (RFC5389) allows a host application to discover the presence of a network address translator on the network, and in such a case to obtain the allocated public IP and port tuple for the current connection. To do so, the protocol requires assistance from a configured, third-party STUN server that must reside on the public network.
[ ] TURN: The Traversal Using Relays around NAT protocol (RFC5766) allows a host behind a NAT to obtain a public IP address and port from a relay server residing on the public Internet. Thanks to the relayed transport address, the host can then receive media from any peer that can send packets to the public Internet.

The PeerConnection mechanism uses the ICE protocol together with the STUN and TURN servers to let UDP-based media streams traverse NAT boxes and firewalls. ICE allows the browsers to discover enough information about the topology of the network where they are deployed to find the best exploitable communication path. Using ICE also provides a security measure, as it prevents untrusted web pages and applications from sending data to hosts that are not expecting to receive them.

Each signalling message is fed into the receiving PeerConnection upon arrival. The APIs send signalling messages that most applications will treat as opaque blobs, but which must be transferred securely and efficiently to the other peer by the web application via the web server.

DataChannel

The DataChannel API is designed to provide a generic transport service allowing web browsers to exchange generic data in a bidirectional peer-to-peer fashion.

The standardisation work within the IETF has reached a general consensus on the usage of the Stream Control Transmission Protocol (SCTP) encapsulated in DTLS to handle nonmedia data types.

The encapsulation of SCTP over DTLS over UDP together with ICE provides a NAT traversal solution, as well as confidentiality, source authentication, and integrity protected transfers. Moreover, this solution allows the data transport to interwork smoothly with the parallel media transport, and both can potentially also share a single transport-layer port number. SCTP has been chosen since it natively supports multiple streams with either reliable or partially reliable delivery modes. It provides the possibility of opening several independent streams within an SCTP association towards a peering SCTP endpoint.

Each stream actually represents a unidirectional logical channel providing the notion of in-sequence delivery. A message sequence can be sent either ordered or unordered. The message delivery order is preserved only for all ordered messages sent on the same stream. However, the DaraChannel API has been designed to be bidirectional, which means that each DataChannel is composed as a bundle of an incoming and an outgoing SCTP stream.

The DataChannel setup is carried out (i.e the SCTP association is created) when the CreateDataChannel() function is called for the first time on an instantiated PeerConnection object. Each subsequent all to the CreateDataChannel() function just creates a new DataChannel within the existing SCTP association.

zhu-ting commented 4 years ago

zhu-ting / WebRTC

WebRTC Architecture #2

WebRTC in the Browser

Signalling

WebRTC API