veovera / enhanced-rtmp

This industry-sanctioned project introduces significant enhancements to the RTMP and FLV specifications, outlining advanced features aimed at revitalizing and modernizing the RTMP solution.
https://veovera.org
Apache License 2.0
272 stars 24 forks source link

Opus Sequence Headers #30

Closed BtbN closed 4 months ago

BtbN commented 4 months ago

I'm a bit confused about the OpusSequenceHeader. The spec says "read either identification or comment header".

I'm not sure how to interpret that. FFmpeg considers only the identification header as "extradata", and only carries that around. Is it fine to only send that?

And how is "either" to be interpreted here? If we wanted to send the comment header, we would need to send a second SequenceStart?

veovera commented 4 months ago

Nice catch, it's fine to only send identification header. We will reword this section to make it clear.

zenomt commented 4 months ago

WebCodecs, at least, doesn't require initialization data for Opus. to date i've been using codec id 15 "Device Specific" for Opus in my JavaScript (*), just sending the coded frames as they come out of the encoder, and for playback i do the normal thing of "oh, i'm receiving audio messages with a different codec, i guess i should flush, create, and initialize a new AudioDecoder".

are we saying that there needs to be an audio "sequence start" for all Enhanced FourCC codecs, whether the codec requires initialization data or not?

(*) today's project is adding Enhanced RTMP Audio to my JavaScript. i did the C++ side yesterday. :)

zenomt commented 4 months ago

so, Opus doesn't require a sequence header when the audio messages are in native Opus format. sequence headers are required when the audio messages are in the Ogg format.

WebCodecs uses the presence or absence of the "description" (which would be the payload of a sequence start message) in the AudioDecoderConfig to distinguish between Ogg or native Opus formats, respectively.

if i had my 'drothers, i'd define Enhanced RTMP to use the Opus native format that doesn't require a sequence header at all.

to allow both formats, i'd say that the SequenceStart message is REQUIRED for Opus, but that an empty (0-length) OpusSequenceHeader means coded frames are in "Opus Native Format", and a non-empty one means coded frames are in "Ogg Format".

zenomt commented 4 months ago

today's project is adding Enhanced RTMP Audio to my JavaScript. i did the C++ side yesterday. :)

https://github.com/zenomt/rtwebsocket/commit/2bede7dfbd8373cc49b9a6d26e10ed0d1672b3a4 hopefully adds playback support for Enhanced Audio at least for codecs supported by WebCodecs in browsers (Opus, FLAC, plus enhanced mode for AAC and maybe MP3). provisional pending some sample media to test with.

for Opus i went with "if i'm switching to Opus from None or Other and there's no SequenceStart, or there is a SequenceStart but its payload is empty, then use Opus native format; and if there's a SequenceStart with a non-empty payload, then use Ogg format". this is backward-compatible with what my JS senders are sending now, but is brittle if SequenceStart is ever used.

veovera commented 4 months ago

if i had my 'drothers, i'd define Enhanced RTMP to use the Opus native format that doesn't require a sequence header at all.

Another possible route is for Opus audio data to always be in the Opus format. The Opus sequence header (Identification Header) is optional. The ID header would not be wrapped by an Ogg page. @BtbN Does ffmpeg ever have this hybrid mode? Perhaps when handling Matroska (MKV), WebM or mp4?

BtbN commented 4 months ago

I'm not sure, I never looked at that code, and it's not immediately obvious to me.

zenomt commented 4 months ago

Another possible route is for Opus audio data to always be in the Opus format. The Opus sequence header (Identification Header) is optional.

in this case i'm assuming you mean that an RTMP Audio SequenceStart message is optional, its payload (if any) is not needed to initialize a decoder, and (for WebCodecs at least) its payload-if-any would not be passed to the decoder (since doing so signals a WebCodecs AudioDecoder to expect coded media to be in Ogg format).

veovera commented 4 months ago

in this case i'm assuming you mean that an RTMP Audio SequenceStart message is optional, its payload (if any) is not needed to initialize a decoder

Correct assumption for what is on the wire for RTMP. As far as what to do with the payload that would be solution dependent.

zenomt commented 4 months ago

and "optional" here meaning that a forwarder could drop an audio SequenceStart entirely and the coded frames would still be decodable.

derrod commented 4 months ago

As far as I can tell the "ID header" is primarily used to signal the channel layout and number of channels, which in ERTMP can be handled by AudioPacketType.MultichannelConfig.

In FFmpeg the header is treated as the "extradata" for the codec and required to correctly configure the libopus decoder for channel counts above 2^1. If we omit this header the extradata would have to be reconstructed like is done for MP4^2.

I don't know how webcodecs would handle configuration for channel counts above 2.

Edit: Also see https://opus-codec.org/docs/opus_api-1.5/group__opus__multistream.html#details

Edit 2: Based on further reading of the specification it seems that the channel map does not necessarily correspond to the channel layout, and would still have to be signaled for multi-channel audio. I seems to me that the easiest way to accomplish this is to simply send the OpusHead packet as the SequenceStart, but consider it optional for stereo or mono audio.

zenomt commented 4 months ago

if initialization data is ever sent, and it is necessary for some decoding situations (like >2 channels), then to avoid ambiguity (especially around stream publish/unpublish/republish) a SequenceStart should probably be required every time.

it's possible that a WebCodecs AudioDecoder can use the "channels" config property to be told how many channels there are without needing an initialization data blob. if so, and we were to use Opus native format instead of Ogg, we'd need to parse the init data in client code rather than passing it directly to the decoder.

veovera commented 4 months ago

After reviewing all the feedback, engaging in extensive discussions, and further consideration, it appears that the best way to proceed involves the following steps:

These are the high-level details which will be documented in greater detail in the specification.

veovera commented 4 months ago