w3c / webrtc-extensions

A repository for experimental additions to the WebRTC API
https://w3c.github.io/webrtc-extensions/
Other
59 stars 19 forks source link

Enabling opus stereo audio without SDP munging (stereo=1) #63

Open perahgren opened 3 years ago

perahgren commented 3 years ago

I wonder whether it is possible to achieve an SDP setup that supports stereo audio sessions (via opus) properly without using SDP munging?

Examples of scenarios to achieve include Symmetric setup stereo setup: -The receiver has not flagged that it wants to receive mono -The sender wants to send stereo -The sender achieves a connection sending stereo to the receiver. -The sender achieves a connection receiving stereo from the receiver.

Asymmetric stereo/mono setup: -The receiver has not flagged that it wants to receive mono -The sender wants to send stereo -The sender achieves a connection sending stereo to the receiver. -The sender achieves a connection receiving mono from the receiver.

murillo128 commented 3 years ago

Not really sure we need to specify anything new.

The opus stereo flag is using for signaling that the endpoint supports receiving stereo:

   stereo:  specifies whether the decoder prefers receiving stereo or
      mono signals.  Possible values are 1 and 0, where 1 specifies that
      stereo signals are preferred, and 0 specifies that only mono
      signals are preferred.  Independent of the stereo parameter, every
      receiver MUST be able to receive and decode stereo signals, but
      sending stereo signals to a receiver that signaled a preference
      for mono signals may result in higher than necessary network
      utilization and encoding complexity.  If no value is specified,
      the default is 0 (mono).

So the sdp mangling we currently need is a hack and all endpoints should signal stereo=1 in the SDP.

The question is if we need an API to make the sender encode an stereo signal. This should be done automatically by the endpoint by checking the number of channels on the audio track to decide whether stereo or mono should be used.

In case the app wanted to use a mono source and upscale it to stereo or an stereo source and downscale to mono, I think that the best way would to do it via WebAudio before passing it to webrtc.

henbos commented 3 years ago

I wonder if this should be solved with...

henbos commented 3 years ago

Proposal:

dictionary RTCCodecOptions {
  unsigned short channels;
};
fippo commented 3 years ago

channels is a term with meaning when it comes to codecs. But I do not think we need an API, just ship stereo=1

murillo128 commented 3 years ago

Also, it can't be at transceiver level, as you may want to send mono and receive stereo.

I agree with @fippo that shiping stereo=1 is enough and and the decision on whether to send stereo or mono should be taken based on the track channels.

henbos commented 3 years ago

Sounds reasonable. In that case it can change on the fly depending on the track. Would it be OK to negotiate stereo if only mono is sent? Wherre is stereo=1 defined?

murillo128 commented 3 years ago

stereo=1 on the offer just means that the endpoint is able to receive stereo.

fippo commented 3 years ago

https://tools.ietf.org/html/rfc7587#section-7.1 now ofc this isn't just about being able to play stereo. Otherwise @perahgren + folks wouldn't have taken such a long time.

opus says specifically

   o  The media subtype ("opus") goes in SDP "a=rtpmap" as the encoding
      name.  The RTP clock rate in "a=rtpmap" MUST be 48000, and the
      number of channels MUST be 2.

mostly because nobody wants to bother with negotiating the number of channels in SDP. Any opus decoder must be able to decode stereo, see https://tools.ietf.org/html/rfc7587#section-3.4 But I think the intention was that stereo=1 is the default.

:shipit:

fippo commented 3 years ago

Also, it can't be at transceiver level, as you may want to send mono and receive stereo.

In general, the handling of declarative attributes (tagging @alvestrand) seems odd.

bradisbell commented 3 years ago

I'm just jumping into this conversation, so please forgive me if this isn't the right thread... but please do consider other channel counts and channel layouts. Currently, sending something like 6 channels of audio doesn't seem to be possible with WebRTC due to browser implementations (which may be fixed in Chromium as of yesterday?). Additionally, for my use case I need to specify an arbitrary channel layout/map, so that the encoder doesn't assume I'm sending 5.1 surround. What I need is 6 discrete channels.

As changes to the implementations and specifications are made, please consider use cases such as this. Thank you!

murillo128 commented 3 years ago

standard opus rtp doesnt support multichannel (channels >2).

Chrome implements it with a non‐standard multiopus codec which, when enabled via sdp mangling, is automatically chosen based on the number of channels of the input track (6 for 5.1 or 8 for 7.1).

fippo commented 3 years ago

note: part of this problem applies to cbr as well as things like dtx One approach here would be to make the sdpFmtpLine writable. OFC that doesnt solve the problem of sdp munging at all :trollface: but at least it is a much more focused kind of munging.

murillo128 commented 3 years ago

note that cbr useinbandfec and usedtx usage on SDP only expresses that the decoder on the receiver prefers cbr, fec or dtx, but in no way controls what the encoder on any of both sides must use:

   cbr:  specifies if the decoder prefers the use of a constant bitrate
      versus a variable bitrate.  Possible values are 1 and 0, where 1
      specifies constant bitrate, and 0 specifies variable bitrate.  If
      no value is specified, the default is 0 (vbr).  When cbr is 1, the
      maximum average bitrate can still change, e.g., to adapt to
      changing network conditions.

   useinbandfec:  specifies that the decoder has the capability to take
      advantage of the Opus in-band FEC.  Possible values are 1 and 0.
      Providing 0 when FEC cannot be used on the receiving side is
      RECOMMENDED.  If no value is specified, useinbandfec is assumed to
      be 0.  This parameter is only a preference, and the receiver MUST
      be able to process packets that include FEC information, even if
      it means the FEC part is discarded.

   usedtx:  specifies if the decoder prefers the use of DTX.  Possible
      values are 1 and 0.  If no value is specified, the default is 0.

So the SDP mungling that is allowed by current implementation is just a hack due to the lack of APIs and in no way is backed up by any standard.

henbos commented 3 years ago

This is my understanding:

To me it sounds like no spec change is needed in order to achieve receiving stereo. @perahgren can you verify that WebRTC is able to receive stereo even if "stereo=1" is missing?

However it doesn't make sense to be sending stereo at the present moment since everybody is implicitly asking for mono by having "stereo=1" missing. To me it would make sense if "stereo=1" is the default in offers and decide the what to send based on the MediaStreamTrack's number of channels.

Questions:

henbos commented 3 years ago

FWIW I don't think it makes sense for the receiver to be the one who controls what the sender is sending stereo, other than in a restrictive sense. But assuming the receiver is OK with receiving stereo, it seems like it should be the sender's choice whether or not to do so. Otherwise we end up in a scenario where if the sender doesn't want to send stereo it has to ask the receiver to ask the sender to do mono... I don't want to encourage additional offer-answer dances.

alvestrand commented 3 years ago

For offer/answer, the relevant text is in section 7.1 of RFC 7587:

o The "stereo" parameter is a unidirectional receive-only parameter. When sending to a single destination, a sender MUST NOT use stereo when "stereo" is 0. Gateways or senders that are sending the same encoded audio to multiple destinations SHOULD NOT use stereo when "stereo" is 0, as this would lead to inefficient use of network resources. The "stereo" parameter does not affect interoperability.

As for what a missing "stereo=1" attribute means: Section 6.1:

stereo: specifies whether the decoder prefers receiving stereo or mono signals. Possible values are 1 and 0, where 1 specifies that stereo signals are preferred, and 0 specifies that only mono signals are preferred. Independent of the stereo parameter, every receiver MUST be able to receive and decode stereo signals, but sending stereo signals to a receiver that signaled a preference for mono signals may result in higher than necessary network utilization and encoding complexity. If no value is specified, the default is 0 (mono).

So receiving stereo is mandatory to support, but sending stereo when stereo=1 is missing is "allowed, but you really should have a strong reason to do so".

I think that once stereo is fully supported in the pipeline, we should put "stereo=1" into the default FMTP line for Opus.

perahgren commented 3 years ago

From what I know, stereo is fully supported in the pipeline. However, I'm not convinced that setting stereo=1 by default really solves this. As I see it, there are two problems

  1. To be able to produce stereo, the SDP needs to be munged (by adding stereo=1) before setLocalDescription.
  2. If the receiver does not want to receive stereo (sets stereo=0 or omits a value for stereo), the sender should heed that by re-doing the createOffer.

If I've understood this correctly, setting stereo=1 by default won't really solve any of the above, apart from not requiring any action when stereo should be sent. If mono should be sent, however, the problems still persist.

Also, if we set stereo=1 by default, won't then suddenly all stereo-capable client that are not explicitly setting stereo=0 start sending stereo audio instead of mono?

henbos commented 3 years ago

While 1) would be solved if it becomes the default (stereo without munging) then the problem of wanting mono is seemingly the same as the original problem, only having the default reversed, so now one might SDP munge to get mono instead.

I think the reason why stereo=1 is more attractive than stereo=0 is that even if you negotiate stereo=1 this only sends stereo if the MediaStreamTrack is a stereo source. So in stereo=1 you can choose whether you want stereo or mono whereas with stereo=0 you have no choice, it's mono-only (assuming we want to respect the receiver's stereo=0).

So what I wonder is: how common are stereo sources, and is it easy to get mono from a stereo source? Can you simply do getUserMedia() with channelCount:1 or do you have to use WebAudio and if so how cumbersome and performance impactful is it to start using WebAudio? I do think the default should reflect what makes sense. But there is not just the default in SDP that comes to play here but also the default of MediaStreamTracks?

henbos commented 3 years ago

Also, if we set stereo=1 by default, won't then suddenly all stereo-capable client that are not explicitly setting stereo=0 start sending stereo audio instead of mono?

Yes. Do we have any idea how common this is? (I don't...)

fippo commented 3 years ago

only having the default reversed, so now one might SDP munge to get mono instead

That is up to the signalling layer which can legitimately strip out stereo=1 and/or replace it with stereo=0

fippo commented 3 years ago

If the receiver does not want to receive stereo (sets stereo=0 or omits a value for stereo), the sender should heed that by re-doing the createOffer.

No? The receiver can declare stereo=0 independently. That is the "declarative" syntax stuff which is very confusing

jan-ivar commented 3 years ago

@henbos What was the resolution here? @fippo did you have some concerns?

jan-ivar commented 3 years ago

@fippo noted that Firefox already sets stereo=1, which I've confirmed.

fippo commented 3 years ago

I do not think (as said quite a few comments up) there is any action for the WG here when it comes to chrome shipping stereo=1. That firefox already does so without the world coming to an end should be a good indicator.

henbos commented 3 years ago

@fippo noted that Firefox already sets stereo=1, which I've confirmed.

Haha, well there we go.

And here's a modified version of @jan-ivar's fiddle that logs the channelCount: https://jsfiddle.net/wase5g6b/1/ When I run it on my MacBook Pro I get channelCount:1 in both Chrome and Firefox. I checked Safari too, but channelCount is undefined there. In any case, thee SDP is the same as Chromium.

@guidou Do you know what Chromium's default for channelCount is? I would assume that a normal microphone is 1 but I don't know what happens if you have some multichannel sound recording equipment.

I think if we write a PR to make stereo=1 the default we just need to make sure that Chromium does up and downsampling correctly, i.e. we don't want stereo=1 to cause us to encode stereo if our MediaStreamTrack has a mono source. @perahgren What would happen today?

henbos commented 3 years ago

@henbos What was the resolution here?

I think we can just merge the PR (to be written) unless @perahgren or @guidou knows about any backwards compatibility issues that this would cause for cases where we do have multiple channels.

henbos commented 3 years ago

Assuming this is ready for PR in the meantime, please correct me if I'm wrong.

juberti commented 3 years ago

I think this may cause problems for Chrome when doing setRD for Opus with stereo=1 for two reasons:

  1. the Opus encoder will be initialized to transmit stereo regardless of the # of channels, which probably results in worse efficiency and sound quality for the selected bitrate
  2. if not set, the default bitrate will be increased from 32kbps to 64kbps to try to account for the need for stereo encoding. This will result in additional bandwidth consumption even for mono sources.

See https://chromium.googlesource.com/external/webrtc/stable/talk/+/d3ecbb30dc2684653d61e8ec88a5382aecf62773/media/webrtc/webrtcvoiceengine.cc#1892 for the relevant code.

Note also that stereo=1 has no effect on setLD.

jan-ivar commented 3 years ago

When I run it on my MacBook Pro I get channelCount:1 in both Chrome and Firefox.

@henbos MBP mics are mono (see Audio Midi Setup). FWIW my Logitech BRIO reports channelCount: 2 in Firefox above.

To test Chrome, I used an in-content device picker to pick my BRIO: I get channelCount: 1 with {audio: true} in M88, but channelCount: 2 in M90. Did the default change recently?

jan-ivar commented 3 years ago

I think this may cause problems for Chrome when doing setRD for Opus with stereo=1

@juberti Wouldn't that show up in a Chrome←→Firefox p2p call today then?

juberti commented 3 years ago

Sure, but the effect will be subtle enough that you won't notice it unless you know what to look for. Still, it should be visible in WebRTC-internals.

henbos commented 3 years ago
  1. the Opus encoder will be initialized to transmit stereo regardless of the # of channels, which probably results in worse efficiency and sound quality for the selected bitrate

and

@juberti Wouldn't that show up in a Chrome←→Firefox p2p call today then?

Then it sounds like the compatibility problem already existed ever since Chrome initially shipped stereo. So we need to fix this regardless of how we decide to control stereo going forward.

To me I think it sounds like the proposed PR still makes sense as-is, but that code like https://chromium.googlesource.com/external/webrtc/stable/talk/+/d3ecbb30dc2684653d61e8ec88a5382aecf62773/media/webrtc/webrtcvoiceengine.cc#1892 need to be update so that the number of channels gets reconfigured to be the minimum of track channels and channels negotiated, and to trigger the possibility of reconfiguring the encoder on replaceTrack() if number of channels change.

Does this make sense to everybody? @juberti @perahgren

To test Chrome, I used an in-content device picker to pick my BRIO: I get channelCount: 1 with {audio: true} in M88, but channelCount: 2 in M90. Did the default change recently?

@guidou Do you know?

henbos commented 3 years ago

"The proposed PR" = https://docs.google.com/presentation/d/1-ZB5pjq0hTl4iNxGhIc3polH9wjCps2Uzqg7D4v1n14/edit#slide=id.gae53209dde_0_11 (slides 29-30)

perahgren commented 3 years ago

A concern I have on the PR is that it does not really solve the underlying problem of lack of API and the need for munging. It will indeed make setting up stereo calls more convenient in that stereo=1 is added by default which avoids the issue that there is no API to control it (and therefore avoids the munging needed).

However, the problem of lack of API, and the resulting need for munging, is then instead imposed on the clients that actually would want to set stereo=0 (or omit the stereo flag).

I may get this wrong, but I think what would be needed is to have an API for setting/clearing the stereo that avoids munging.

Having stereo=1 would be handy, but I share the concerns of @juberti and also that there may be other unforeseen effects for clients that implicitly rely on stereo=1 not being set by default.

henbos commented 3 years ago

The proposed PR is "send stereo by default if the source is stereo". If the source is mono, you won't be sending stereo (up-sampling there would be a bug).

The concern I see is for applications that don't specify channelCount when they do getUserMedia() and thus rely on the default. While most microphones are probably mono, as @jan-ivar pointed out, it looks as if the default channelCount changed from 1 to 2 in a recent version of Chrome? I don't know if that is a good or bad default, but if you actually have a stereo microphone and you are recording stereo then why not send stereo? The question I would ask is, if we don't want stereo, why are we recording stereo?

If stereo or mono should be the default seems like an interesting question. But if you really don't don't want stereo, then regardless of what we think the default should be, you can always restrict the source to mono with getUserMedia({audio:{channelCount:1}}. So when would you need to do SDP munging?

fippo commented 3 years ago

looks like I was wrong about "this should be totally safe" but good the channelcount change was caught that way. Added a note to the chrome issue/change whether that is intended.

perahgren commented 3 years ago

As I see it, the issue is on how to allow the receiving client to restrict the sending client to send stereo. From what I can see, the only way to do that is to include stereo=0 in the answer SDP. And from what I've heard the only way that can be achieved is via munging (maybe I'm missing something there though?).

henbos commented 3 years ago

As I see it, the issue is on how to allow the receiving client to restrict the sending client to send stereo. From what I can see, the only way to do that is to include stereo=0 in the answer SDP. And from what I've heard the only way that can be achieved is via munging (maybe I'm missing something there though?).

Here's my take on this: Yes, the answer SDP would have to be munged if you want to turn this off via SDP in a browser client on the receiver side. However, because the receiver has to be prepared to receive stereo regardless of what this attribute says, and it is ultimately up to the sender whether to send mono and stereo, this modification to the SDP does not actually have any effects on the receiver side. It only has an effect on the sender side where if the sender respects the receivers' wishes, it restricts it to mono.

From this I conclude that you could munge the SDP after receiver.setLocalDescription(answer) but before sender.setRemoteDescription(answer). This is not disallowed by the spec, the spec only forbids munging local SDP, not remote SDP.

So if you want to turn off stereo via SDP you still can without the SDP munging having to be illegal.

That said I would view stereo=0 as primarily interesting for the use case of "talking to a legacy endpoint or a non-browser endpoint (server)". For a real application, I would try to make sure the sender and receiver agree on stereo or mono outside of SDP. But if you want to do it in SDP, you can.

henbos commented 3 years ago

People might think that to be ugly I don't know, but it would be nice to avoid extra negotiation logic or APIs on both sides to decide what stereo attribute to use when ultimately it doesn't do anything on the receiver-side since stereo has to be receivable regardless.

henbos commented 3 years ago

@perahgren found this yesterday:

 The "stereo" parameter is a unidirectional receive-only parameter.
 When sending to a single destination, a sender MUST NOT use stereo
 when "stereo" is 0.  Gateways or senders that are sending the same
 encoded audio to multiple destinations SHOULD NOT use stereo when
 "stereo" is 0, as this would lead to inefficient use of network
 resources.

If the browser is the sender, this would change the "stereo=0" from SHOULD NOT to MUST NOT send stereo, since RTCPeerConnection only sends to a single destination. This does not affect what a sensible implementation does. If the browser is the receiver, nothing changes, we still have to be prepared to receive stereo in case we are talking to a gateway.

I'm still in favor of stereo=1 being the default because then "you send what you have" and don't need to worry about additional APIs to control SDP. It's then up to the application to agree on capture channel count, similar to how it is today up to the application to agree on capture resolution and frame rate.

However the channel count's default has been raised as a concern to move forward. If we would accidentally have two channels then we may accidentally cause regressions. We may want to discuss https://github.com/w3c/mediacapture-main/issues/775 before merging a PR here?

juberti commented 3 years ago

There are a lot of moving parts here, so let me try to summarize.

Summary of the summary: proposed PR is good, but we can't implement it yet without some refactoring in Chrome.

  1. I overall agree with @henbos 's overall suggestion for the PR from https://docs.google.com/presentation/d/1-ZB5pjq0hTl4iNxGhIc3polH9wjCps2Uzqg7D4v1n14/edit#slide=id.gae53209dde_0_16. Just like the default is to send 1080p if that's the captured video resolution, sending stereo should be the default if that's the # of captured audio channels.
  2. As @henbos also mentions, if the remote side indicates stereo=0, we should not send stereo. This protects old browsers, or conferencing servers that can only handle mono.
  3. I don't think receiving browsers should need to turn off stereo via SDP. Same as how we don't provide a hook for receiving browsers to clamp what video resolution or audio sampling rate they receive.
  4. The default channel count should probably be 1 to avoid unexpected changes to apps, and certainly be set to 2 if and only if the mic is in fact a stereo mic. This avoids wasting bits for no benefit.
  5. The code in https://chromium.googlesource.com/external/webrtc/stable/talk/+/d3ecbb30dc2684653d61e8ec88a5382aecf62773/media/webrtc/webrtcvoiceengine.cc#1892 needs to change in a few ways:
    1. I don't think this particular code knows enough about the channel count to properly configure the encoder. It probably needs to be pushed down to a lower layer. This will hurt audio quality when audio bitrate is specified until fixed, I think.
    2. The decision it makes about the default audio bitrate, for mono vs stereo, also requires knowledge of the channel count. Accordingly it should probably be pushed down to a lower layer. This will waste bits when audio bitrate is unspecified until fixed.
    3. Tracks can be replaced, so the encoder config needs to be re-evaluated whenever this happens. Ergo, it needs to happen somewhere other than setSendCodecs.
  6. It probably makes sense at some point to allow send-side control of stereo via RTCRtpEncodingParameters, similar to ptime, dtx, etc. So that is another place that would need to get plumbed down to encoder config.
henbos commented 3 years ago

Excellent summary Justin. I think the relevant pieces of information are known/knowable at the relevant points in time to avoid wasting any bits, but not known to all the layers that need to know about them - until refactoring.