roc-streaming / roc-toolkit

Real-time audio streaming over the network.
https://roc-streaming.org
Mozilla Public License 2.0
1.09k stars 213 forks source link

SDP support #200

Open gavv opened 5 years ago

gavv commented 5 years ago

Implement SDP parser and formatter. We can then use it for session negotiation (RTSP or SIP, see #34), and announcement (SAP; not planned yet).

References: RFC4566

gavv commented 5 years ago

Implementation

We need a new module roc_sdp with three components:

To ensure that our implementation is compatible, we also need a set of unit tests with real-world SDP samples captured from other software.

The parser can be implemented using a parser generator like YACC or Bison. RFC 4566 provides a BNF grammar for SDP. There are the following requirements to the generator:

gavv commented 5 years ago

Fields and attributes

Here is the subset of the SDP fields and attributes that we should be able to handle in the first implementation.

These fields should be handled:

v= (sdp version)
o= (session origin / session ID)
c= (connection type and address)
m= (media type, port, and payload ID)

These attributes should be handled:

a=rtpmap (dynamic payload ID: encoding name, sample rate, channel set)
a=recvonly (session mode / direction)
a=sendrecv (--//--)
a=sendonly (--//--)
a=inactive (--//--)
a=type (session type; defines default session mode if omitted)
a=fmtp (codec-specific parameters; we'll need it for Opus)
a=fec-source-flow (FECFRAME; see RFC 6364)
a=fec-repair-flow (--//--)
a=repair-window (--//--)

These fields should not be handled but it would nice to log them:

s= (session human-readable name)
i= (session or media human-readable description)
u= (session URI)
e= (session email)
p= (session phone number)

Other fields and attributes may be ignored.

gavv commented 5 years ago

We also need to add a command-line option to read SDP description from a file:// URI, both on sender and receiver.

gavv commented 4 years ago

We decided to use Ragel for parsing. We already started using it to parse URIs (see roc_address module).

alexandremgo commented 4 years ago

Hey! I can try to implement the new module roc_sdp. I guess I can look at roc_rtp to have a good idea on how to implement this new module ?

I'll also look at roc_address for the parser

gavv commented 4 years ago

Great!

I guess I can look at roc_rtp to have a good idea on how to implement this new module ?

Yes.

To add a new module, simply create a directory with .h, .cpp and .rl files and add module name to ROC_MODULES in SConstruct.

I'll also look at roc_address for the parser

Yes. See also #282.

I also suggest you to read RFC 4566 (if you haven't read it before) and to take a look at the official Ragel PDF. Feel free to ask questions if you'll need help. (BTW, for extensive discussions I'd suggest to use mailing list, but github is also OK).

Among other things, SDP RFC provides ABNF grammar. Ragel does not support BNF, but, in this specific case, I think it will be straightforward to translate BNF to a regular grammar supported by Ragel.

gavv commented 4 years ago

After implementing roc_sdp module we will want to use it. The main usage will be in RTSP, but it will take some time to implement it.

To start with something simple, we can implement a very primitive form of session negotiation when the server prints SDP to a file or stdout and the client reads SDP from a file or stdin, and the user is responsible to transfer SDP from server to client.

This will be useful for debugging and probably for some specific cases when users employ they own signaling protocols. We can think about it after merging roc_sdp module.

gavv commented 4 years ago

Maybe this will be also useful: https://gavv.github.io/articles/minisaplistener/

gavv commented 4 years ago

Features missing in #300 (to be done in future PRs):

gavv commented 4 years ago
   A session description MUST contain either at least one "c=" field in
   each media description or a single "c=" field at the session level.
   It MAY contain a single session-level "c=" field and additional "c="
   field(s) per media description, in which case the per-media values
   override the session-level settings for the respective media.

So we need to implement the following behavior:

gavv commented 4 years ago
   Multiple addresses or "c=" lines MAY be specified on a per-media
   basis only if they provide multicast addresses for different layers
   in a hierarchical or layered encoding scheme.  They MUST NOT be
   specified for a session-level "c=" field.

   The slash notation for multiple addresses described above MUST NOT be
   used for IP unicast addresses.
  1. We should check that if /<number of addresses> is present, or multiple c= fields are present, the address is multicast (SocketAddr has a method to check this). If it's not, the parsing should fail.

  2. Since we don't support hierarchical / layered encoding currently, let's parse all c= fields, but use only the first one for now.