w3c / mediacapture-main

Media Capture and Streams specification (aka getUserMedia)
https://w3c.github.io/mediacapture-main/
Other
125 stars 61 forks source link

what is the default channelCount #775

Open fippo opened 3 years ago

fippo commented 3 years ago

https://w3c.github.io/mediacapture-main/#def-constraint-channelCount doesn't say which is the default.

Chrome recently changed from 1 to 2 as we noticed in https://github.com/w3c/webrtc-extensions/issues/63# fiddle: https://jsfiddle.net/fippo/c0ax4tv1/1/ (note: might require a stereo-capable mic; macbook mics are not)

henbos commented 3 years ago

The recent change was noted by @jan-ivar:

To test Chrome, I used an in-content device picker to pick my BRIO: I get channelCount: 1 with {audio: true} in M88, but channelCount: 2 in M90. Did the default change recently?

henbos commented 3 years ago

Per summary over at https://github.com/w3c/webrtc-extensions/issues/63#issuecomment-786118846, would it make sense to mandate default channel count or is this something that should be up to the browser?

@guidou what do you think about default channel count = 1 in Chrome?

henbos commented 3 years ago

@guidou says the default likely changed here: https://chromium-review.googlesource.com/c/chromium/src/+/2593122

youennf commented 3 years ago

I am in favour of making browsers as consistent as possible in the way tracks are initialised, past the point devices are selected. That includes channel count, width, height... It is probably painful for web developers and error prone since testing might only happen in one browser in practice.

In general, it seems good to document the default behavior, especially if it is consistent amongst browsers. If the defaults have to change in the future, it might be best to document the change and coordinate within implementations.

guidou commented 3 years ago

I think we should in as many cases as possible specify default values for all constrainable properties a criteria to break ties when multiple configurations have the same fitness. This would make all implementations work in a more predictable way, especially in the common cases where few or no constraints are passed to gUM.

youennf commented 3 years ago

Agreed. In Safari, we are artificially adding ideal width/height/frame rate constraints (640/480/30) if no corresponding constraint is given.

guidou commented 3 years ago

Chromium does something very similar to that as well for those 3 properties.

henbos commented 3 years ago

Could we do the same for channelCount without having to revert the culprit CL?

jan-ivar commented 3 years ago

This would make all implementations work in a more predictable way, especially in the common cases where few or no constraints are passed to gUM

I'm sympathetic to web compat concerns, which we experience as well. However, I'm not sure this would be better for users. Constraints were purposely designed to balance control between two stakeholders: the user and the application, perhaps best illustrated by their two extremes:

  1. The user configures the camera and settings they want in their OS or browser, perhaps with per-site exceptions.
  2. Each application configures what camera and settings to use

1 is an unconstrained track. 2 is a 100% constrained track. This is why constraints were purposely designed to not specify defaults, and why constraints are distinct from settings in the API in the first place.

User agents conspiring on a fixed set of default settings forever for all devices in the interest of web compat for apps, would wreck 1.

For example: If a user inserts a stereo microphone they just bought, why shouldn't they get stereo on every site out there? One of the advantages of the native WebRTC stack is that this can just work without requiring each app to support it.

Browsers may not be doing much with their defaults atm, but defaulting to 640x480x30 mono forever doesn't seem very forward looking. I don't think we should lock ourselves down there.

youennf commented 3 years ago
  1. The user configures the camera and settings they want in their OS or browser, perhaps with per-site exceptions.

I know users can change default devices from OS UI. Is Firefox (or any browser) providing such camera configuration UI?

If that is important, we can try to find wording that would still allow per-site default exceptions.

If a user inserts a stereo microphone they just bought, why shouldn't they get stereo on every site out there?

We could define a default rule so that, post device selection, channelCount would be set to 2 if feasible. I do not see how this use case describes the benefit of the current approach, especially if one browser would use stereo if available and another would stick to mono.

One of the advantages of the native WebRTC stack is that this can just work without requiring each app to support it.

I am not sure what 'just work' means. Some native clients do not support stereo and may break if the default suddenly changes. Some web clients will not benefit from stereo and will experience suboptimal experience (due to bandwidth increase).

Browsers may not be doing much with their defaults atm, but defaulting to 640x480x30 mono forever doesn't seem very forward looking.

The idea is not to stick to 640x480x30 + mono forever, web specs do change everyday. The idea is to document these defaults, if we can agree on these defaults. Then, to change these defaults progressively and consistently.

I do not know what default rules Firefox is using. If it does something similar (or is willing to do something similar) to Chrome and Safari, why not documenting it?

jan-ivar commented 3 years ago

I know users can change default devices from OS UI. Is Firefox (or any browser) providing such camera configuration UI?

Well, we have the camera and microphone picker in the Firefox permission prompt. 🙂 But even without that, all major browsers respect OS defaults, which are often user configurable. E.g. going to Audio MIDI Setup and choosing my Logitech BRIO makes it the default microphone on macOS and Firefox, and now {audio: true} gives you channelCount 2 instead of 1.

I also could have sworn an earlier version of Audio MIDI Setup let me change the 2 ch to 1 ch, but I might be misremembering there, or it did with a different device I no longer own. I believe other OSes allow this though.

This means there is no one default channelCount. That's what we should document. If your app assumes there is, then your app is broken. This is why we have constraints and not a simpler settings API: if an app requires mono to not break, constrain it to mono. The model is: if you care about it, constrain it. Otherwise you get what you get.

youennf commented 3 years ago

In theory, reading the spec should be sufficient to implement it and get interoperability with other implementations. I do not think the spec is there yet, implementors have to study what other browsers are doing to actually get to that point :(

As an example, the spec does not say whether, if echoCancellation is supported, echoCancellation should be on or off. I would guess browsers turn echo cancellation on by default, and many applications are relying on it.

I generally disagree with "If your app assumes there is, then your app is broken", good and reliable defaults are extremely important.

henbos commented 3 years ago

E.g. going to Audio MIDI Setup and choosing my Logitech BRIO makes it the default microphone on macOS and Firefox, and now {audio: true} gives you channelCount 2 instead of 1.

Very interesting. Thanks for checking.

I think the user specifying default device makes sense because you don't want to default to that old camera or microphone that is collecting dust behind your computer, you want your shiny new device that is in your face.

However when it comes down to what "technical details" to open a particular device in, like which resolution or number of channels to use when recording, I'm not sure I see the value in letting the user/OS override what the browser does by default. The browser and/or application should know better than a normal user, especially when the application will in most use cases be streaming the content to a VC server.

This means there is no one default channelCount. That's what we should document. If your app assumes there is, then your app is broken.

If the concern is that deciding defaults now is not forward-looking, we can always revisit what the defaults should be later. At the end of the day, if an implementation changes their defaults from one version to the next that is a change in behavior, whether or not there's a spec change behind it.

Jan-Ivar, you've said in the past that "predictability trumps usefulness", but in this case it seems that unspecified defaults is neither useful or predictable.

Or am I missing something, what is the usefulness in not knowing what you get?

henbos commented 3 years ago

For what it's worth, the Chromium's change in default channel count was accidental and there is a revert in progress. This will make implementing https://github.com/w3c/webrtc-extensions/issues/63 a smoother transition, whether or not we can agree on a standardized default channel count here.

guidou commented 3 years ago

I don't see any contradiction in the spec having defaults and the user being able to override those defaults. The spec already says "User Agents are encouraged to default to using the user's primary or system default device for kind (when possible).", so it is already encouraging a default for the deviceId property.

To define defaults, I think we should look at what browsers are currently doing and, where there is coincidence or near-coincidence, adopt those defaults in the spec. Also, defaults don't always need to be hard-coded constants. For deviceId we're using a system-defined constant.

That said, even with defaults we will probably not achieve full compatibility across browsers since properties are correlated and different implementations have different capabilities. Still, if we can significantly improve it for the most common cases, I think that would be beneficial.

Experience shows that real-world applications rely on browser defaults and that is not going to change just because we emphasize in the spec that they shouldn't. This channelCount issue in Chrome is a good example. Chrome didn't really have a universal default as channelCount is correlated with other properties. If echoCancellation, noiseSuppression or autoGainControl was enabled, channelCount was always 1 because Chrome's audio processing implementation only supported mono output. When no processing was used, the channel count was the same as the hardware configuration. When Chrome upgraded its processing implementation to support more channels, the default for stereo microphones automatically changed to 2 in all cases and that automatically caused regressions for some applications relying on output always being mono.

guidou commented 3 years ago

So, to summarize, I agree with @youennf that "good and reliable defaults are extremely important" because experience has shown many times that when defaults change applications break, even if the default change was 100% spec compliant.

youennf commented 3 years ago

To define defaults, I think we should look at what browsers are currently doing and, where there is coincidence or near-coincidence, adopt those defaults in the spec.

I'll file an issue specifically for this. It seems channelCount differs between browsers, we can keep this issue to see whether we can reach consensus.

jan-ivar commented 3 years ago

I don't see any contradiction in the spec having defaults and the user being able to override those defaults. ... experience has shown many times that when defaults change applications break, even if the default change was 100% spec compliant.

@guidou There's a contradiction right there: Applications that brittle won't work for users who override those defaults.

henbos commented 3 years ago

Why do users have to be able to override defaults?

henbos commented 3 years ago

(I mean, other than "which device")

jan-ivar commented 3 years ago

Why do users have to be able to override defaults? (I mean, other than "which device")

@henbos There's another contradiction: defaults may be device dependent, like channelCount in Firefox.

What's the point of constraints if the defaults are known?

jan-ivar commented 3 years ago

The browser and/or application should know better than a normal user,

Not to pile on the contradictions, but: applications should know better than the user, yet somehow don't know to constrain the settings they rely on to not break miserably?

youennf commented 3 years ago

@henbos There's another contradiction: defaults may be device dependent, like channelCount in Firefox.

Do you know what Firefox is doing here? Is Firefox deciding to use channelCount = 2 if device allows it, like {channelCount: 2}? Or is it that Firefox is using whatever the OS default os, like {channelCount: 'default'}? These two options can be specced as well as {channelCount:1}.

As I said, I do not think the spec is precise enough for implementors to be able to interop with existing implementations. I believe this is one criteria to be able to go to REC that the spec is not yet meeting.

Two questions may help me understand your position, which is still fuzzy to me: Are you ok with the spec describing where browsers do use the same defaults? Are you ok with the idea to converge on the same defaults for browsers? Or at least to flag where implementations may defer?

henbos commented 3 years ago

What's the point of constraints if the defaults are known?

The point of constraints is to 1) not have to expose every possible device configuration, and 2) not have to write your own algorithm similar to constraints processing. It is perfectly reasonable to question these decisions, but that is a separate discussion to whether or not it would be useful for constraints to be more predictable and testable.

There's another contradiction: defaults may be device dependent, like channelCount in Firefox.

Not a contradiction. You can always downsample to 1 or have a default that is device-dependent like defaulting to the maximum of the device's capability. The default does not have to be "exact" either, it could be "give me what is closest to the default when not specified". For example if you have a 360p camera the "default" could still be 480p and you could open in what is closes to 480p, which would be 360p. I assume the browsers don't crash if such a camera exists? Similarly if we think stereo is the future, we could have the "default" be 2 channels but have mono devices open with 1 channel because 1 is the closest value to 2 that is possible with that device.

Not to pile on the contradictions, but: applications should know better than the user, yet somehow don't know to constrain the settings they rely on to not break miserably?

Not a contradiction. This has more to do with testability. If an app developer writes code and tries it out with a couple of devices on a couple of browsers and consistently get the same result they might think that the behavior is well-defined and have no idea that some user in the wild is able to change this in OS settings. They might not go through the spec and specify defaults for every possible constraint available, like channel count, they'll probably fix it on a case-by-case basis if problems crop up.

Embarassingly enough, Chrome shipped its stereo=1 hack and so by the sound of it has probably been upsampling to stereo when talking to Firefox. Not what was wanted, but slipped through the cracks. This illustrates that everything is not sufficiently tested.

bradisbell commented 3 years ago

Please, do not spec a set of default constraints or channel counts. Give us whatever the underlying system gives us for default channel count, frame size, frame rate, etc. The OS knows better than the user agent does.

Perhaps a stereo audio input device switches to a mono mode when the user agent assumes mono with no constraints given. The web application doesn't know that this is undesirable as it just wants the default behavior of the audio device. The web application isn't going to request stereo, because it may end up with a wasteful upmix of mono audio devices to stereo. Setting default constraints rather than letting the underlying system determine behavior prevents us from using a sensible system-level default, perhaps even one that the user configured.

As an example, the spec does not say whether, if echoCancellation is supported, echoCancellation should be on or off. I would guess browsers turn echo cancellation on by default, and many applications are relying on it.

They do, and the effects have been very bad for audio quality. I feel strongly that audio on the web would be better off if these options were not enabled by default. If there is any doubt, you can watch any news broadcast from COVID times with remote guests behind some WebRTC-based call. Even if they use an IFB/IEM/earphone to prevent feedback, audio quality is damaged because these DSP algorithms were enabled by default and few developers know to turn them off.

I believe the base specifications should not assume use cases. (For example, echoCancellation by default assumes some sort of bi-directional audio communication where feedback could occur.) Specifying default constraints make assumptions about the use case, as well as the hardware capabilities, user preferences, and application intent. This isn't good, in my opinion.

Ideally, the web layer is as thin as reasonable, giving us a cross-platform API that interferes as little as possible. To that end, I think the default channelCount, and other stream constraints, should not be put into the spec. If an implementer needs to set default constraints for some reason, such as the base system not supporting a default stream format from the capture device, then they should also be free to implement as they see fit.

jan-ivar commented 3 years ago

Is Firefox deciding to use channelCount = 2 if device allows it, like {channelCount: 2}? Or is it that Firefox is using whatever the OS default os, like {channelCount: 'default'}?

@youennf I'm not aware of a cross-device "OS default". Seems per device on mac and Windows (didn't try linux). My BRIO appears only settable to 2 (max?) channels, although sampleRate can be changed:

image I believe Firefox takes the max channels offered by the specific device and caps it at 2, due to bug 1393401.

As I said, I do not think the spec is precise enough for implementors to be able to interop with existing implementations. I believe this is one criteria to be able to go to REC that the spec is not yet meeting.

The spec is precise, leaving user agents in charge of defining the underlying system's "platform defaults".

Platforms vary, and devices vary, so having browsers vary in same-platform + same-device situations, seems more like a healthy reminder for apps not to assume every system and device will be the same, than a bug. User agents are allowed their own interpretation of a platform and its devices (e.g. a privacy-focused browser may offer a limited and unrecognizable view). So I think I'm rejecting your definition of "interop" here being that every browser must represent the underlying system's resources the same way.

Constraints are precise and interoperable, whether applying specific settings or min/max ranges around the underlying system's defaults.

Are you ok with the spec describing where browsers do use the same defaults?

No. I share @bradisbell's concerns that we may have been short-sighted in our clamping of default platform capabilities to accommodate one particular sink (RTCPeerConnection), and that we shouldn't cement them in the spec.

Are you ok with the idea to converge on the same defaults for browsers? Or at least to flag where implementations may defer?

No, I don't see the interop problem you're solving.

jan-ivar commented 3 years ago

Why do users have to be able to override defaults? (I mean, other than "which device")

There's another contradiction: defaults may be device dependent, like channelCount in Firefox.

Not a contradiction. You can ... have a default that is device-dependent like defaulting to the maximum ... capability

@henbos Sure, that works up to 2 if we pick channelCount 2, but not 1. But what's your goal? E.g. why restrict user overrides if we're allowing channelCount to change through device switching anyway? Where's the invariant?

If an app developer writes code and tries it out with a couple of devices on a couple of browsers and consistently get the same result they might think that the behavior is well-defined and have no idea that some user in the wild is able to change this in OS settings.

...or they may not have tested enough devices or not devices sold next week. It's the same problem (if you default to 2).

But I'd rather hear about the user in the wild: How were they able to configure their microphone in a way that is unique enough to break this app? Did their customization have absolutely no impact on any site they visited? Or did it work on the ones that mattered to them (e.g. audio sites) and had no change on others (web conference ones)? I think that's the use case.

Chrome shipped its stereo=1 hack and so by the sound of it has probably been upsampling to stereo when talking to Firefox. Not what was wanted, but slipped through the cracks. This illustrates that everything is not sufficiently tested.

Wrong spec. My understanding of the Chrome bug is it affected all audio input sources to RTCPeerConnection, not just stereo microphones, hence clearly a bug, and not related to defaults. Tests can use constraints and don't depend on defaults AFAIK.

henbos commented 3 years ago

The web application isn't going to request stereo, because it may end up with a wasteful upmix of mono audio devices to stereo.

I don't think we should ever upsample, regardless if we're talking about the defaults or constraints, it should be a preference, but this would be capped by what the device is capable of, ensuring you never get upsampling. We don't upsample resolution so I don't see why we would upsample channel count.

I believe the base specifications should not assume use cases. (For example, echoCancellation by default assumes some sort of bi-directional audio communication where feedback could occur.) Specifying default constraints make assumptions about the use case, as well as the hardware capabilities, user preferences, and application intent.

Even if we don't specify defaults in the spec they still have to be specified in implementation code. You can't get around defaults. The question isn't "defaults or no defaults?" the question is "well-defined defaults or unspecified defaults?". In cases where there are meaningful and configurable OS defaults we could talk about whether or not those should override the browser defaults, but I'm not sure there are meaningful OS/user choices beyond which device to pick.

One reason we might be disagreeing is me not buying the premise that there are meaningful defaults, so if we're going to pick arbitrary ones, we might as well all agree on what those arbitrary defaults are for the sake of predictability. I proposed defaulting to 1, but another option is defaulting to "maximum channels that the device is capable of".

Give us whatever the underlying system gives us for default channel count, frame size, frame rate, etc. The OS knows better than the user agent does.

On one hand I hear that the OS provides meaningful defaults...

I'm not aware of a cross-device "OS default". Seems per device on mac and Windows (didn't try linux). My BRIO appears only settable to 2 (max?) channels, although sampleRate can be changed:

... and on the other hand I hear that configurable defaults is only a subset of the capabilities and that it varies by device and platform and maybe you can't configure it at all because you'll only get the maximum? Which one is it?

It seems like the strongest case for not having defaults is being able to configure what the defaults is either by user knowing best or by OS knowing best, but from this discussion I really can't tell if the OS or user does.

Platforms vary, and devices vary, so having browsers vary in same-platform + same-device situations, seems more like a healthy reminder for apps not to assume every system and device will be the same, than a bug.

Devices varying is inherent to the problem we are trying to solve. OS or OS settings varying is only a problem if we don't have well-defined defaults.

Wrong spec.

It was just an example proving the point about testability.

henbos commented 3 years ago

Today you could have the same machine, same device, same OS, same OS settings and the only thing that is different is which browser you are running - and you might get different results. This hurts testability and predictability. This is what the discussion should be about. If we don't care about that, then so be it, but let's decide based on what we want to solve rather than fear of upsampling or changing our minds later about what the defaults should be.

youennf commented 3 years ago

Let's take the example of sampleRate. Web Audio says the following: If contextOptions.sampleRate is specified, set the sampleRate of this AudioContext to this value. Otherwise, use the sample rate of the default output device.

The mediacapture does not say anything while exposing a similar API as Web Audio (optional sampleRate value). As it is, this spec is not reaching the minimal amount of precision that other specs provide and that implementers need.

youennf commented 3 years ago

Another example, it seems that all browsers are doing 640x480x30. How was it done? If it is because implementors went to see what other implementations were doing, this again shows that the spec is not precise enough.

henbos commented 3 years ago

Yep, 640x480x30 was completely arbitrary, because the spec didn't say. So if we have arbitrary defaults, why not specify them? It's not like this is some OS default either - you can't go into Windows Camera device settings and change this to 800x600x16, but even if you could, what's the point of exposing this to web apps?

jan-ivar commented 3 years ago

Give us whatever the underlying system gives us for default channel count, frame size, frame rate, etc. The OS knows better than the user agent does.

On one hand I hear that the OS provides meaningful defaults...

@henbos Yes. The OS has a default device with specific capabilities that influence what settings sites get by default:

(await getUserMedia({audio: true})).getAudioTracks()[0].getSettings().channelCount // 1 or 2 ?

It seems irrelevant to an app and compat whether 1 or 2 is decided by: A. which OS device is the user default B. A user having picked a different device in Firefox's microphone picker, or C. A linux user reconfiguring properties of the device driver of the device from A or B.

This is all under "user agent", and not the spec. Specs generally shy away from dictating the relationship between the user and their agent.

jan-ivar commented 3 years ago

Web Audio says the following: If contextOptions.sampleRate is specified, set the sampleRate of this AudioContext to this value. Otherwise, use the sample rate of the default output device.

@youennf Exactly, and what singular testable value is that? I see no definition of "default output device", so this is presumably defined by the user agent and its interpretation of the platform, not the spec. This seems to support the model @bradisbell wants.

The mediacapture does not say anything while exposing a similar API as Web Audio (optional sampleRate value).

It too says defaults are up to the user agent: "The User Agent MAY choose new settings for the constrainable properties of the object at any time. When it does so it MUST attempt to satisfy all current Constraints. ... the UA can use any information available to it. ... the energy usage of the camera varies ..., or whether ... will cause the device driver to apply resampling."

The spec even mentions "platform defaults" as being useful (and preferable?) results, but shies away from mandating them.

Part of the reason for that is that unlike output, there's no singular camera or microphone greater than another always. Another reason ironically is our desire to be conservative and biased toward RTCPeerConnection (I also wonder, would https://github.com/w3c/mediacapture-output/issues/87 influence web audio's sampleRate?)

youennf commented 3 years ago

Web Audio says the following: If contextOptions.sampleRate is specified, set the sampleRate of this AudioContext to this value. Otherwise, use the sample rate of the default output device.

@youennf Exactly, and what singular testable value is that?

We can probably test it, by comparing WebAudio default sampleRate with getUserMedia default sample rate. You first need to verify the input device has the same groupId as the output device.

The spec even mentions "platform defaults" as being useful (and preferable?) results, but shies away from mandating them.

The spec mentions platform defaults, but do not define them. This is what we want to address here.

Also the spec mentions platform defaults but does not tell how or when they should apply. This is again bad and I hope we can address it, though just defining defaults would be good.

Part of the reason for that is that unlike output, there's no singular camera or microphone greater than another always. We are talking constraints not devices here. For some constraints, the OS provides default values and we should use them, like web audio is doing. For other constraints, the OS does not provide default values, say echoCancellation.

AIUI, @bradisbell actually wants default values so that the browser processing is as thin as possible. This means echoCancellation=false for instance. This does not mean 'no default'.

youennf commented 3 years ago

Also, if we really think websites should be specific about every constraints that do matter to them, we should make that clear in the spec examples. Spec examples are relying on defaults to give interoperable results amongst browsers.

bradisbell commented 3 years ago

@youennf AIUI, @bradisbell actually wants default values so that the browser processing is as thin as possible. This means echoCancellation=false for instance. This does not mean 'no default'.

To be specific, what I want is the browser not to modify what I'm getting from the underlying system. If something at a lower layer turns on some echo cancellation algorithm (eg. most any Lenovo laptop), I'll accept that because that's what the user has and has control over. By default, I don't want the browser doing anything extra. I have no problems with extra features, but I believe they should be opted into.

Using the Lenovo microphone example, if there were some way for the system to indicate to the browser that echo cancellation were on, then my web application not specifying any constraints would end up with echoCancellation=true because that's what the system is providing. (Of course, I don't think there is any standardization on this specific feature, so there's no way that could work, since the echo cancellation is something that the user agent does.)

In the case of an observable constraint such as frame width/height, I'd prefer the user agent take whatever the underlying system provides by default. That means that I don't want the browser to set a default. I want something upstream to figure out what is best, using whatever method it thinks is best.

@henbos Today you could have the same machine, same device, same OS, same OS settings and the only thing that is different is which browser you are running - and you might get different results.

Agreed on the discussion point, but let's keep in mind the various audio APIs available. Even on Windows, one browser may use WASAPI and another DirectSound. This could legitimately and acceptably result in different behavior for the application.

@henbos It's not like this is some OS default either - you can't go into Windows Camera device settings and change this to 800x600x16, but even if you could, what's the point of exposing this to web apps?

Web apps should be treated no differently than desktop apps, wherever practical. In my own usage, it would be nice to have sensible defaults for cameras and capture devices, if they were available. If the user has configured that they like a particular resolution, that resolution should be used by default everywhere.

An example use case... I ran out of USB 2.0 bandwidth regularly, and when Google Hangout or similar would request 1920x1080 from my webcam, things would work... but barely. If I could set a default of 1280x720, the webcam would work. Google Hangout had an option for lowering video resolution, but most web apps did not.

Another use case is that pro audio devices in exclusive mode will change their hardware clock rate to match what is requested. If by default browsers requested 48 kHz instead of a configured 44.1 kHz, that might knock my whole studio out of sync. Whereas if the browser just accepted what it was given, and the web application accepted what it was given, it would get the 44.1 kHz stream and all would be well.

jan-ivar commented 3 years ago

The spec mentions platform defaults, but do not define them.

Just like web audio doesn't define "default output device". It's UA specific.

We can probably test it, by comparing WebAudio default sampleRate with getUserMedia default sample rate.

This would test that the user agent is consistent with its own definition of "default device". I agree with this level of testing.

But that's different from mandating default device setting values (regardless of device?) must be the same across user agents, which would require a spec change.

For some constraints, the OS provides default values and we should use them, like web audio is doing. For other constraints, the OS does not provide default values, say echoCancellation.

Agreed.¹ I'd put channelCount and sampleRate in the first category.

I appreciate the nuance that there are two questions: 1) whether we need to specify more around how UAs must deduce defaults, and 2) should those defaults be hard-coded values or be derived from platform/OS.

I think this is going to be difficult, and is why the spec has stayed out of this so far, trusting user agents with defining this. I think I disagree with the notion that code is not "interoperable" unless it gives the exact same values as other browsers for the same platform + device. The "unconstrained" API feature here is literally to ask for the user agent's defaults. As long as that user agent is consistent with itself, that might be enough.

A hard part will be facing up to cases where we didn't go for the ideal (if not max) available quality say, like the conservative 640x480x30 + echoCancellation + noiseCancellation + audoGainControl. These decisions were clearly biased toward making sure the RTCPeerConnection sink didn't blow up. Also, Chrome's resizeMode: "crop-and-scale" approach basically morphs getUserMedia from a device discovery API into a resolution control-surface for RTCRtpSender.

Ironically, the mediacapture spec still also has some fairly liberal ideas around what UAs are allowed to do to change unconstrained camera settings to adapt to sinks: "A <video> element that displays media from a dynamic source can either perform scaling or it can feed back information along the media pipeline and have the source produce content more suitable for display." — I don't think anyone has implemented anything like this. But it's something defining defaults would close the door on.


1. though since an OS could provide echoCancellation, the OS default is arguably false except on the Lenovos mentioned.

youennf commented 3 years ago

One major aspect of a web spec is to describe what implementations are doing, or what they are trying to do. That is what we are trying to achieve here. All browsers are selecting defaults and deciding when to apply them. This is core to the getUserMedia algorithm.

I'd put channelCount and sampleRate in the first category.

I think there is consensus for sampleRate, probably sampleSize as well. There is apparently no consensus for channelCount yet. We should acknowledge this and seek for consensus.

These decisions were clearly biased toward making sure the RTCPeerConnection sink didn't blow up.

That is true and we can always revisit this. We should also acknowledge this usecase is the major usecase right now so this decision is not absurd. Also that is what implementations are doing, some websites are relying on it so we should document it. You could try tweaking these defaults and go to some sites like webrtc.github.io/samples or appr.tc, the results will be poor.

For resizeMode, I doubt there is a meaningful OS default and we should strive for defining a good default. It is for instance hurting interoperability that getUserMedia({video: {width:100, height:100}}) would not provide the same video size on different browsers because of different defaults for resizeMode.

I don't think anyone has implemented anything like this. But it's something defining defaults would close the door on.

Should the need/usecase arise, we can revisit this issue. Implementations do change, as well as specifications. The question might be whether we think this need will arise in the foreseeable future and if so how we would accommodate it. My guess is that we do not have such a usecase yet.

We are really digging into https://github.com/w3c/mediacapture-main/issues/777 though. This issue is focused on channelCount. I think we should first make progress on the easy one before trying to solve this one.

youennf commented 3 years ago

Using the Lenovo microphone example, if there were some way for the system to indicate to the browser that echo cancellation were on

This example seems to works well with the echoCancellation=false by default rule. If the Lenovo microphone is doing echo cancellation and the User Agent cannot disable it, getCapabilities should show that echoCancellation cannot be false for that device. User Agent will use echoCancellation to true because that is the only value.

The fact that the User Agent does not know whether the microphone is doing echo cancellation or not should be treated more as a bug than as a usecase.

In the case of an observable constraint such as frame width/height, I'd prefer the user agent take whatever the underlying system provides by default.

What happens if there is no default? For instance, on MacOS, if we do not set the width/height, we will most often use what the last application set. This is an issue for two reasons:

jan-ivar commented 3 years ago

There is apparently no consensus for channelCount yet. We should acknowledge this and seek for consensus.

To clarify, no consensus is needed today, as it is up to user agents. This ensures defaults can keep up with changing times.

But the OP question is useful, whether it leads to more spec language or not, because it's revealing implementer bugs.

To answer: Firefox defaults to the max a microphone can do, capped at 2. But we've found bugs (1, 2). This represents our desire to promote and support stereo for users who have stereo mics (both in audio apps and web conference calls).

But regardless of what the default is, we need to ensure all browsers are able to send and receive stereo (at least when directed to do so), and that they're never wasting bandwidth upsampling mono to stereo. We should ensure we have WPT tests for this.

henbos commented 3 years ago

I sense people thinking that OS defaults are very important, and so that if the OS says that the default is something, then we should listen. So I'm willing to conclude from the discussion that OS defaults are more important than predictability.

So let's only look at when there are no OS defaults for a particular setting. E.g. after constraints processing and knowing which device to pick and filtering out any settings according the constraints processing, you are left with more than a single configuration for the said device, and no OS defaults to tell you which one is better.

Example: After constraints processing you know you're going to open camera X which is capable of being opened in resolution A or B and the default does not say if A or B is better. Both A and B are spec-compliant today. What do you do?

In this case, a "tie-breaker" must be implemented. Question is: should the spec give any guidance here, or should every User Agent come up with their own solution?

jan-ivar commented 3 years ago

Maybe we can write something like: "The default may vary by user agent, platform or hardware device"?

danielinux7 commented 1 year ago

I have an issue with echoCanellation, channelCount and sampleRate on Chrome, When I set echoCancellation to false, then channelCount sets to 2, sample rate to 44100, even when I explicitly set their values to 1 and 48000.