Open perahgren opened 3 years ago
Not really sure we need to specify anything new.
The opus stereo flag is using for signaling that the endpoint supports receiving stereo:
stereo: specifies whether the decoder prefers receiving stereo or
mono signals. Possible values are 1 and 0, where 1 specifies that
stereo signals are preferred, and 0 specifies that only mono
signals are preferred. Independent of the stereo parameter, every
receiver MUST be able to receive and decode stereo signals, but
sending stereo signals to a receiver that signaled a preference
for mono signals may result in higher than necessary network
utilization and encoding complexity. If no value is specified,
the default is 0 (mono).
So the sdp mangling we currently need is a hack and all endpoints should signal stereo=1
in the SDP.
The question is if we need an API to make the sender encode an stereo signal. This should be done automatically by the endpoint by checking the number of channels on the audio track to decide whether stereo or mono should be used.
In case the app wanted to use a mono source and upscale it to stereo or an stereo source and downscale to mono, I think that the best way would to do it via WebAudio before passing it to webrtc.
I wonder if this should be solved with...
Proposal:
dictionary RTCCodecOptions {
unsigned short channels;
};
channels is a term with meaning when it comes to codecs. But I do not think we need an API, just ship stereo=1
Also, it can't be at transceiver level, as you may want to send mono and receive stereo.
I agree with @fippo that shiping stereo=1
is enough and and the decision on whether to send stereo or mono should be taken based on the track channels.
Sounds reasonable. In that case it can change on the fly depending on the track. Would it be OK to negotiate stereo if only mono is sent? Wherre is stereo=1 defined?
stereo=1 on the offer just means that the endpoint is able to receive stereo.
https://tools.ietf.org/html/rfc7587#section-7.1 now ofc this isn't just about being able to play stereo. Otherwise @perahgren + folks wouldn't have taken such a long time.
opus says specifically
o The media subtype ("opus") goes in SDP "a=rtpmap" as the encoding
name. The RTP clock rate in "a=rtpmap" MUST be 48000, and the
number of channels MUST be 2.
mostly because nobody wants to bother with negotiating the number of channels in SDP. Any opus decoder must be able to decode stereo, see https://tools.ietf.org/html/rfc7587#section-3.4 But I think the intention was that stereo=1 is the default.
:shipit:
Also, it can't be at transceiver level, as you may want to send mono and receive stereo.
In general, the handling of declarative attributes (tagging @alvestrand) seems odd.
I'm just jumping into this conversation, so please forgive me if this isn't the right thread... but please do consider other channel counts and channel layouts. Currently, sending something like 6 channels of audio doesn't seem to be possible with WebRTC due to browser implementations (which may be fixed in Chromium as of yesterday?). Additionally, for my use case I need to specify an arbitrary channel layout/map, so that the encoder doesn't assume I'm sending 5.1 surround. What I need is 6 discrete channels.
As changes to the implementations and specifications are made, please consider use cases such as this. Thank you!
standard opus rtp doesnt support multichannel (channels >2).
Chrome implements it with a non‐standard multiopus codec which, when enabled via sdp mangling, is automatically chosen based on the number of channels of the input track (6 for 5.1 or 8 for 7.1).
note that cbr
useinbandfec
and usedtx
usage on SDP only expresses that the decoder on the receiver prefers cbr, fec or dtx, but in no way controls what the encoder on any of both sides must use:
cbr: specifies if the decoder prefers the use of a constant bitrate
versus a variable bitrate. Possible values are 1 and 0, where 1
specifies constant bitrate, and 0 specifies variable bitrate. If
no value is specified, the default is 0 (vbr). When cbr is 1, the
maximum average bitrate can still change, e.g., to adapt to
changing network conditions.
useinbandfec: specifies that the decoder has the capability to take
advantage of the Opus in-band FEC. Possible values are 1 and 0.
Providing 0 when FEC cannot be used on the receiving side is
RECOMMENDED. If no value is specified, useinbandfec is assumed to
be 0. This parameter is only a preference, and the receiver MUST
be able to process packets that include FEC information, even if
it means the FEC part is discarded.
usedtx: specifies if the decoder prefers the use of DTX. Possible
values are 1 and 0. If no value is specified, the default is 0.
So the SDP mungling that is allowed by current implementation is just a hack due to the lack of APIs and in no way is backed up by any standard.
This is my understanding:
To me it sounds like no spec change is needed in order to achieve receiving stereo. @perahgren can you verify that WebRTC is able to receive stereo even if "stereo=1" is missing?
However it doesn't make sense to be sending stereo at the present moment since everybody is implicitly asking for mono by having "stereo=1" missing. To me it would make sense if "stereo=1" is the default in offers and decide the what to send based on the MediaStreamTrack's number of channels.
Questions:
FWIW I don't think it makes sense for the receiver to be the one who controls what the sender is sending stereo, other than in a restrictive sense. But assuming the receiver is OK with receiving stereo, it seems like it should be the sender's choice whether or not to do so. Otherwise we end up in a scenario where if the sender doesn't want to send stereo it has to ask the receiver to ask the sender to do mono... I don't want to encourage additional offer-answer dances.
For offer/answer, the relevant text is in section 7.1 of RFC 7587:
o The "stereo" parameter is a unidirectional receive-only parameter. When sending to a single destination, a sender MUST NOT use stereo when "stereo" is 0. Gateways or senders that are sending the same encoded audio to multiple destinations SHOULD NOT use stereo when "stereo" is 0, as this would lead to inefficient use of network resources. The "stereo" parameter does not affect interoperability.
As for what a missing "stereo=1" attribute means: Section 6.1:
stereo: specifies whether the decoder prefers receiving stereo or mono signals. Possible values are 1 and 0, where 1 specifies that stereo signals are preferred, and 0 specifies that only mono signals are preferred. Independent of the stereo parameter, every receiver MUST be able to receive and decode stereo signals, but sending stereo signals to a receiver that signaled a preference for mono signals may result in higher than necessary network utilization and encoding complexity. If no value is specified, the default is 0 (mono).
So receiving stereo is mandatory to support, but sending stereo when stereo=1 is missing is "allowed, but you really should have a strong reason to do so".
I think that once stereo is fully supported in the pipeline, we should put "stereo=1" into the default FMTP line for Opus.
From what I know, stereo is fully supported in the pipeline. However, I'm not convinced that setting stereo=1 by default really solves this. As I see it, there are two problems
If I've understood this correctly, setting stereo=1 by default won't really solve any of the above, apart from not requiring any action when stereo should be sent. If mono should be sent, however, the problems still persist.
Also, if we set stereo=1 by default, won't then suddenly all stereo-capable client that are not explicitly setting stereo=0 start sending stereo audio instead of mono?
While 1) would be solved if it becomes the default (stereo without munging) then the problem of wanting mono is seemingly the same as the original problem, only having the default reversed, so now one might SDP munge to get mono instead.
I think the reason why stereo=1 is more attractive than stereo=0 is that even if you negotiate stereo=1 this only sends stereo if the MediaStreamTrack is a stereo source. So in stereo=1 you can choose whether you want stereo or mono whereas with stereo=0 you have no choice, it's mono-only (assuming we want to respect the receiver's stereo=0).
So what I wonder is: how common are stereo sources, and is it easy to get mono from a stereo source? Can you simply do getUserMedia() with channelCount:1 or do you have to use WebAudio and if so how cumbersome and performance impactful is it to start using WebAudio? I do think the default should reflect what makes sense. But there is not just the default in SDP that comes to play here but also the default of MediaStreamTracks?
Also, if we set stereo=1 by default, won't then suddenly all stereo-capable client that are not explicitly setting stereo=0 start sending stereo audio instead of mono?
Yes. Do we have any idea how common this is? (I don't...)
only having the default reversed, so now one might SDP munge to get mono instead
That is up to the signalling layer which can legitimately strip out stereo=1 and/or replace it with stereo=0
If the receiver does not want to receive stereo (sets stereo=0 or omits a value for stereo), the sender should heed that by re-doing the createOffer.
No? The receiver can declare stereo=0 independently. That is the "declarative" syntax stuff which is very confusing
@henbos What was the resolution here? @fippo did you have some concerns?
@fippo noted that Firefox already sets stereo=1, which I've confirmed.
I do not think (as said quite a few comments up) there is any action for the WG here when it comes to chrome shipping stereo=1. That firefox already does so without the world coming to an end should be a good indicator.
@fippo noted that Firefox already sets stereo=1, which I've confirmed.
Haha, well there we go.
And here's a modified version of @jan-ivar's fiddle that logs the channelCount: https://jsfiddle.net/wase5g6b/1/ When I run it on my MacBook Pro I get channelCount:1 in both Chrome and Firefox. I checked Safari too, but channelCount is undefined there. In any case, thee SDP is the same as Chromium.
@guidou Do you know what Chromium's default for channelCount is? I would assume that a normal microphone is 1 but I don't know what happens if you have some multichannel sound recording equipment.
I think if we write a PR to make stereo=1 the default we just need to make sure that Chromium does up and downsampling correctly, i.e. we don't want stereo=1 to cause us to encode stereo if our MediaStreamTrack has a mono source. @perahgren What would happen today?
@henbos What was the resolution here?
I think we can just merge the PR (to be written) unless @perahgren or @guidou knows about any backwards compatibility issues that this would cause for cases where we do have multiple channels.
Assuming this is ready for PR in the meantime, please correct me if I'm wrong.
I think this may cause problems for Chrome when doing setRD for Opus with stereo=1 for two reasons:
See https://chromium.googlesource.com/external/webrtc/stable/talk/+/d3ecbb30dc2684653d61e8ec88a5382aecf62773/media/webrtc/webrtcvoiceengine.cc#1892 for the relevant code.
Note also that stereo=1 has no effect on setLD.
When I run it on my MacBook Pro I get channelCount:1 in both Chrome and Firefox.
@henbos MBP mics are mono (see Audio Midi Setup). FWIW my Logitech BRIO reports channelCount: 2
in Firefox above.
To test Chrome, I used an in-content device picker to pick my BRIO: I get channelCount: 1
with {audio: true}
in M88, but channelCount: 2
in M90. Did the default change recently?
I think this may cause problems for Chrome when doing setRD for Opus with stereo=1
@juberti Wouldn't that show up in a Chrome←→Firefox p2p call today then?
Sure, but the effect will be subtle enough that you won't notice it unless you know what to look for. Still, it should be visible in WebRTC-internals.
- the Opus encoder will be initialized to transmit stereo regardless of the # of channels, which probably results in worse efficiency and sound quality for the selected bitrate
and
@juberti Wouldn't that show up in a Chrome←→Firefox p2p call today then?
Then it sounds like the compatibility problem already existed ever since Chrome initially shipped stereo. So we need to fix this regardless of how we decide to control stereo going forward.
To me I think it sounds like the proposed PR still makes sense as-is, but that code like https://chromium.googlesource.com/external/webrtc/stable/talk/+/d3ecbb30dc2684653d61e8ec88a5382aecf62773/media/webrtc/webrtcvoiceengine.cc#1892 need to be update so that the number of channels gets reconfigured to be the minimum of track channels and channels negotiated, and to trigger the possibility of reconfiguring the encoder on replaceTrack() if number of channels change.
Does this make sense to everybody? @juberti @perahgren
To test Chrome, I used an in-content device picker to pick my BRIO: I get channelCount: 1 with {audio: true} in M88, but channelCount: 2 in M90. Did the default change recently?
@guidou Do you know?
"The proposed PR" = https://docs.google.com/presentation/d/1-ZB5pjq0hTl4iNxGhIc3polH9wjCps2Uzqg7D4v1n14/edit#slide=id.gae53209dde_0_11 (slides 29-30)
A concern I have on the PR is that it does not really solve the underlying problem of lack of API and the need for munging. It will indeed make setting up stereo calls more convenient in that stereo=1 is added by default which avoids the issue that there is no API to control it (and therefore avoids the munging needed).
However, the problem of lack of API, and the resulting need for munging, is then instead imposed on the clients that actually would want to set stereo=0 (or omit the stereo flag).
I may get this wrong, but I think what would be needed is to have an API for setting/clearing the stereo that avoids munging.
Having stereo=1 would be handy, but I share the concerns of @juberti and also that there may be other unforeseen effects for clients that implicitly rely on stereo=1 not being set by default.
The proposed PR is "send stereo by default if the source is stereo". If the source is mono, you won't be sending stereo (up-sampling there would be a bug).
The concern I see is for applications that don't specify channelCount when they do getUserMedia() and thus rely on the default. While most microphones are probably mono, as @jan-ivar pointed out, it looks as if the default channelCount changed from 1 to 2 in a recent version of Chrome? I don't know if that is a good or bad default, but if you actually have a stereo microphone and you are recording stereo then why not send stereo? The question I would ask is, if we don't want stereo, why are we recording stereo?
If stereo or mono should be the default seems like an interesting question. But if you really don't don't want stereo, then regardless of what we think the default should be, you can always restrict the source to mono with getUserMedia({audio:{channelCount:1}}. So when would you need to do SDP munging?
looks like I was wrong about "this should be totally safe" but good the channelcount change was caught that way. Added a note to the chrome issue/change whether that is intended.
As I see it, the issue is on how to allow the receiving client to restrict the sending client to send stereo. From what I can see, the only way to do that is to include stereo=0 in the answer SDP. And from what I've heard the only way that can be achieved is via munging (maybe I'm missing something there though?).
As I see it, the issue is on how to allow the receiving client to restrict the sending client to send stereo. From what I can see, the only way to do that is to include stereo=0 in the answer SDP. And from what I've heard the only way that can be achieved is via munging (maybe I'm missing something there though?).
Here's my take on this: Yes, the answer SDP would have to be munged if you want to turn this off via SDP in a browser client on the receiver side. However, because the receiver has to be prepared to receive stereo regardless of what this attribute says, and it is ultimately up to the sender whether to send mono and stereo, this modification to the SDP does not actually have any effects on the receiver side. It only has an effect on the sender side where if the sender respects the receivers' wishes, it restricts it to mono.
From this I conclude that you could munge the SDP after receiver.setLocalDescription(answer) but before sender.setRemoteDescription(answer). This is not disallowed by the spec, the spec only forbids munging local SDP, not remote SDP.
So if you want to turn off stereo via SDP you still can without the SDP munging having to be illegal.
That said I would view stereo=0 as primarily interesting for the use case of "talking to a legacy endpoint or a non-browser endpoint (server)". For a real application, I would try to make sure the sender and receiver agree on stereo or mono outside of SDP. But if you want to do it in SDP, you can.
People might think that to be ugly I don't know, but it would be nice to avoid extra negotiation logic or APIs on both sides to decide what stereo attribute to use when ultimately it doesn't do anything on the receiver-side since stereo has to be receivable regardless.
@perahgren found this yesterday:
The "stereo" parameter is a unidirectional receive-only parameter. When sending to a single destination, a sender MUST NOT use stereo when "stereo" is 0. Gateways or senders that are sending the same encoded audio to multiple destinations SHOULD NOT use stereo when "stereo" is 0, as this would lead to inefficient use of network resources.
If the browser is the sender, this would change the "stereo=0" from SHOULD NOT to MUST NOT send stereo, since RTCPeerConnection only sends to a single destination. This does not affect what a sensible implementation does. If the browser is the receiver, nothing changes, we still have to be prepared to receive stereo in case we are talking to a gateway.
I'm still in favor of stereo=1 being the default because then "you send what you have" and don't need to worry about additional APIs to control SDP. It's then up to the application to agree on capture channel count, similar to how it is today up to the application to agree on capture resolution and frame rate.
However the channel count's default has been raised as a concern to move forward. If we would accidentally have two channels then we may accidentally cause regressions. We may want to discuss https://github.com/w3c/mediacapture-main/issues/775 before merging a PR here?
There are a lot of moving parts here, so let me try to summarize.
Summary of the summary: proposed PR is good, but we can't implement it yet without some refactoring in Chrome.
Excellent summary Justin. I think the relevant pieces of information are known/knowable at the relevant points in time to avoid wasting any bits, but not known to all the layers that need to know about them - until refactoring.
I wonder whether it is possible to achieve an SDP setup that supports stereo audio sessions (via opus) properly without using SDP munging?
Examples of scenarios to achieve include Symmetric setup stereo setup: -The receiver has not flagged that it wants to receive mono -The sender wants to send stereo -The sender achieves a connection sending stereo to the receiver. -The sender achieves a connection receiving stereo from the receiver.
Asymmetric stereo/mono setup: -The receiver has not flagged that it wants to receive mono -The sender wants to send stereo -The sender achieves a connection sending stereo to the receiver. -The sender achieves a connection receiving mono from the receiver.