w3c / webrtc-pc

WebRTC 1.0 API
https://w3c.github.io/webrtc-pc/
Other
438 stars 115 forks source link

Opus mono/stereo and remote track channelCount #3010

Open henbos opened 1 month ago

henbos commented 1 month ago

Opus is a codec that is capable of both mono and stereo.

In SDP, you can specify "stereo=X" to say what your receiver preference is, but importantly:

Today in Chrome, you get mono (track.getSettings().channelCount == 1) by default, but if "stereo=1" is used you get stereo (channelCount == 2). Importantly, the decoder will downmix or upmix to match the desired output (e.g. if stereo is used you will get a stereo track even if the input on the wire is mono).

We would like to support stereo without SDP munging. The question is what is the expected behavior?

  1. Opus is a stereo codec, so always output stereo regardless of signal on the wire.
  2. Adjust the output of the decoder to match what is on the wire (mono = mono, stereo = stereo). This could change on the fly.

While I originally thought 2) was more intuitive, I'm told that Firefox does 1). It also turns out that the audio team prefers to do 1) to avoid some complexity and to reduce the risk of audio glitches due to reconfiguring the decoder in the event that the signal switches back and forth on the fly (use case: virtual audio ssrc and speaker switching between mono and stereo signals, but this is also achievable with replaceTrack in a WPT).

Proposal A: Mandate "always stereo" for stereo codecs. Proposal B: Mandate dynamically switching between mono and stereo. Proposal C: Mandate nothing; it is up to the user agent.

henbos commented 1 month ago

@jan-ivar @youennf @alessiob

fippo commented 1 month ago

see also https://github.com/w3c/webrtc-extensions/issues/63

henbos commented 1 month ago

Ah that old issue... it does seem to be in favor of defaulting to stereo=1 but allowing mono via channelCount:1 constraints.

However it does not capture the nuance between Proposal A or Proposal B here. For example, even if we encode mono or stereo depending on the local track's channelCount, there's a separate question regarding if when we decode this signal if the remote track's channelCount would be 1 or 2? According to the audio experts, there's "no harm" in having channelCount be 2 on the remote track regardless if the signal being decoded was mono or stereo. The benefit of defaulting to 2 is less complexity and less risk of glitches due to reconfiguring

henbos commented 4 weeks ago

Action items:

henbos commented 4 weeks ago

Filed #3011 for channelCount in settings

dontcallmedom-bot commented 2 days ago

This issue had an associated resolution in WebRTC November 19 2024 meeting – 19 November 2024 (Remote channelCount + Stereo Opus #3010 #3011):

RESOLUTION: proceed with adding .channelCount on remote tracks

henbos commented 2 days ago

RESOLUTION: proceed with adding .channelCount on remote tracks

In addition to that, we also discussed "what if opus is mono on the wire but the opus decoder is stereo capable, should the output of the decoder be mono or stereo?" I proposed that user agents MAY say channelCount:2 on the remote track in this case (even though channelCount:1 would more accurately reflect the wire). Quoting from the notes of the discussion:

Jan-Ivar: re proposal 2, "MAY" is not great

Henrik: I suspect implementations would align with what libwebrtc does

Jan-Ivar: let's confirm it and standardize that … if the libwebrtc behavior isn't satisfactory, we can revisit that

I suspect that libwebrtc opus stereo will produce 2 channels (even if wire is 1 channel), so it sounded to me like there was a "go-ahead" to return channelCount:2 here assuming our libwebrtc understanding is correct. But that we reserve the right to revisit this in the future if this behavior isn't satisfactory.

henbos commented 2 days ago

I.e. if it turns out everybody does channelCount:2 here we can make the spec say "MUST" and otherwise revisit, in the meantime let's just document that remoteTrack.getSettings().channelCount reflects the number of channels of the last decoded audio frame and add WPTs for what we think would happen

henbos commented 2 days ago

Next steps IIUC: