w3c / mediacapture-extensions

Extensions to Media Capture and Streams by the WebRTC Working Group
https://w3c.github.io/mediacapture-extensions/
Other
19 stars 15 forks source link

Expose camera presets to web pages #12

Open youennf opened 3 years ago

youennf commented 3 years ago

Following on https://github.com/w3c/mediacapture-main/issues/739, current API is making it difficult for web developers to select constraints when they are tied with each other. Also, it makes it usually hard for web developers to select particular native presets for which they could expect the best performance as user agents would limit processing such as downsampling. One possibility would be to expose native camera presets to web developers so that they can generate their constraints given to applyConstraints more easily. A preset could be defined as a set of constraints (width, heigh, frame rate, pixel format...) with discrete values.

youennf commented 3 years ago

This was discussed at last interim and it was asked to investigate how the current API can cope with cases that this new API could help.

Here is a usecase that focuses on width and height.

The use case is: A web site wants to get a 4/3 video stream, ideally 800x600, ideally native resolution to maximise the quality/reduce processing.

One approach is to pass ideal constraints to getUserMedia with resizeMode to none (ideal or exact). It works well if 800x600 is supported (which my USB camera supports).

Let's look now at what happens if 800x600 is not natively supported, and camera supports 640/480, 1024/768, 1280/720 and 1920/1080 (my iMac Pro built-in camera presets).

If we compute the fitness distance, 640x480 will be picked. This is not too bad. But the application might have preferred 1280x720 as it is the closest higher resolution and video would be crisper than 640x480.

Developer might try 1920x1080 as per capabilities (max resolution), although 1280x720 might be a better fit. But it is difficult to quickly discover that 1280x720 is a native resolution.

One approach is for the developer to use min/max constraints for width and height in getUserMedia. Developer will need to handle the case an over constrained error is returned. This might require multiple getUserMedia calls, with carefully crafted constraints so that either the result can be used or it should reject, to not trigger multiple prompts. Also, this might not be great if we look for gating getUserMedia by a user gesture.

Another approach is to keep ideal constraints in getUserMedia and do trial errors with applyConstraints and min/max constraints. Of course, the developer has only one try in getUserMedia resolution promise callback. Otherwise, there is a risk for having to redo the camera setup, which might be slow, might trigger the capture icon to switch on and off (not great for user).

If we look at how this would be done with presets, developer would be able to call getUserMedia({video : true}) or with ideal constraints only to help user select the best device. Then, based on the preset information, developer will be able to call applyConstraints once in getUserMedia resolution promise callback with the optimal constraints. Camera will be set up once and only once. Using presets allows to:

Potential downsides:

Note also that adding frame rate or pixel format could further make the current API use even more complex. This API would also allow to NOT introduce yet another 'nativeFrameRate' and/or 'nativePixelFormat' constraint.

@jan-ivar, any thoughts?

henbos commented 3 years ago

It seems clear to me that many apps know what they want and that the current API is limiting.

The idea of having to almost reverse-engineer what constraints are doing or to do what can only be described as a ”trial and error” seems like more of a workaround to a misfit API than a real solution to the app’s problem. That’s not good enough if we care about this use case.

If the initial prompt gives you sufficient permissions that we could expose all the presets for the device that was picked and then pick from that I think that would address the app’s needs more than any constraint could.

I support the proposal that was presented at the interim. I like how it would allow both picking the device and then configuring it correctly without multiple prompts. (For example a separate prompt for exposing resolutions and then later for opening would be a bad user experience.)

jan-ivar commented 3 years ago

The use case is: A web site wants to get a 4/3 video stream, ideally 800x600, ideally native resolution ... If we compute the fitness distance, 640x480 will be picked. This is not too bad. But the application might have preferred 1280x720

1280x720 is not 4:3, so that makes no sense to me.

Let me restate what I think your use case is, with all requirements up front:

The use case is: A web site preferably wants a native resolution, ideally 800x600, but prefers slightly higher resolutions like 1024x768 or even 1280x720 over lower ones if 800x600 is not available, and only wants non-native if it can't have any of that.

Solution A (with proof):

await getUserMedia({video: {width: 912, height: 660, resizeMode: "none"}});

Solution B (if you wanted more precise control, e.g. to avoid 1024x768 for some reason):

await getUserMedia({video: {width: 800, height: 600, advanced: [
  {resizeMode: "none"},
  {width: 800, height: 600},
  {width: 1280, height: 720},
  {width: 1280, height: 1024},
  {width: 640, height: 480},
]}});

Solution C:

try {
  await getUserMedia({video: {width: {min: 800}, height: {min: 600}, resizeMode: "none"}});
} catch (e) {
  if (e.name != "OverconstrainedError") throw;
  await getUserMedia({video: {width: 800, height: 600}});
}

This seems well covered to me, and not that hard.

jan-ivar commented 3 years ago

this might not be great if we look for gating getUserMedia by a user gesture.

User gesture was replaced by transient activation which is timer based and "expected be at most a few seconds", so that shouldn't be a problem here since OverconstrainedError is instant.

henbos commented 3 years ago

Solution B and C are in my opinion "trial and error" solutions, and while they may produce the correct result on popular cameras, they seem like workarounds.

Even Solution A, which is the most clean and straight-forward, purposefully asks the API for a probably-non-existing resolution in hopes to nudge the constraints processing algorithm up instead of down. So if a device was actually able to provide what you asked for in this case, you probably would have wanted to ask for something else. Is this just a theoretical concern at this point or a practical one? I'm not sure, but I do think we are making assumptions about which resolutions devices are likely to provide.

Exposing presets gives the app full control and may be extended to help solve issues like #13, but it's hard to tell if presets fall into the category of "must-have" or "nice-to-have".

henbos commented 3 years ago

And how would these solutions work in coordination with user-chooses?

Use case: with a single prompt have the user select a device without any filtering based on device resolution capabilities, and then open the picked camera in a resolution that the app wants.

Example: app prefers 1080p but the user should be able to pick camera. They have a 480p camera and a 1080p camera and want to open the 480p camera because it faces the right direction

henbos commented 3 years ago

And how should native frame rates be weighed against native resolutions? These could widely differ on USB cameras like Logitech C920.

jan-ivar commented 3 years ago

Solution B and C are in my opinion "trial and error" solutions

C) exact constraints and the OverconstrainedError they cause, have only two use cases by design: "trial and error" and refusal to operate entirely. The latter is the less appealing one.

B) advanced, which was the original spec, is trial without error.

A) fitness distance provides an alternative to apps making assumptions inherent in finite lists of known modes today.

I see two use cases for presets:

  1. Presenting available resolutions to the user, to defer choice entirely
  2. Let JS describe their algorithm of preference using if-then-else code.

Number 2 seems like advanced all over again with a less declarative syntax, since this brilliant algorithm will still have to be written ahead of time.

it's hard to tell if presets fall into the category of "must-have" or "nice-to-have".

I've tried above to show it's a "nice-to-have", but feel free to stump me, since it's hard to prove a negative.

But I can try: theoretically, you can interrogate the complete set of modes using resizeMode: none and applyConstraints, which means presets can be polyfilled, if someone were inclined to chase use case Number 1.

jan-ivar commented 3 years ago

I should also point out the obvious: that Solution A, B, and C are not either-or. They all exist today, so I don't know that we need a Solution D.

youennf commented 3 years ago

In general, a single fitness distance is bound to not cover all application cases. The selection of intertwined parameters is an optimization problem. If solving it with a distance minimisation approach, the selection of the distance function should be a choice of the web application. If we provide enough information for a given device, the application will be able to solve the problem by itself, choosing whatever distance function and/or method to choose the optimal parameter set.

Now the current spec has known shortcomings. As you expressed during the call, selecting native frame rate might require a new nativeFrameRate constraint. If we add the ability to select native pixel format, we might also want to add a new nativePixelFormat constraint. Do we want to solve the above? If so, constraints or presets?

So far, I believe exposing presets has more benefits than adding new constraints. It is very simple to implement (a simple getter no change to applyConstraints), should get decent interoperability and help developers. I also believe presets might help in the future user-choose getUserMedia as the constraints could be used for device selection and not device setup (device setup happening in getUserMedia resolution callback). This model (getUserMedia for device selection, applyConstraints for device setup) seems superior to me.

It would be interesting to think about the potential downsides of presets to fully evaluate this approach.

jan-ivar commented 3 years ago

And how would these solutions work in coordination with user-chooses?

Works fine. Just try Firefox today.

Example: app prefers 1080p but the user should be able to pick camera

await getUserMedia({video: {height: 1080}});

And how should native frame rates be weighed against native resolutions?

Feel free to fiddle with my "proof" above, which contains native frameRates to play with. This is all well-defined.

See also https://github.com/w3c/mediacapture-main/issues/762.

youennf commented 3 years ago

Solution A is an example of how people have to find workarounds to use this API. The app will have to check getSettings and sometimes call applyConstraints, especially if adding more constraints like facingMode. Solution B does not seem to work in Chrome, which is the only browser supporting resizeMode AFAIK. Solution C can disrupt user-chooses.

@jan-ivar, I would like to understand your current assessment of presets and the potential weaknesses you might have found. I understand you say that presets are nice-to-have with the assumption of further updating a little bit the spec. Is that correct? Do you think presets would simplify developers life? Do you think presets would add much complexity to the spec? Or to browser implementations? Do you think presets might further help if we add a pixelFormat constraint (which will probably be a string constraint, thus further competing with resizeMode, facingMode...)?

If we introduce presets, we could encourage using constraints in getUserMedia for device selection and applyConstraints for device setup, which goes well with user-chooses. Do you have concerns with those potential guidelines?

youennf commented 3 years ago

app prefers 1080p but the user should be able to pick camera.

This is a potential example of a case where the intent of getUserMedia constraints might differ from applyConstraints. The app might want user to select a HD camera so will do getUserMedia({ video : { width: 1920} }).

That does not mean the app actually wants to start capturing HD right away. The app might want user to select HD camera as the quality is better even in lower resolutions. Or the app wants to start with a lower resolution and only use the highest resolution if the receiver can actually make use of this resolution.

henbos commented 3 years ago

I think the app even being allowed to filter out web cams based on their capability is a mistake and think that should be the user's choice, but we've talked about that already, so I'll try to avoid derailing.

I realize that the idea of doing getUserMedia followed by applyConstraints to separate camera selection from camera configuration makes sense whether or not we have presets. But when it comes to how to best configure the camera, whether we want to use presets or constraints, I think the heart of the discussion here is our attitude towards the pros-and-cons of having presets exposed, so Youenn's list of questions above is interesting to try to answer.

Anything that ends up listing a lot of configurations or doing trial-and-error is, in my opinion, a workaround to a flaw in the API. One could argue that the workaround is "not that bad", but I think the balancing act between "how important is frame rate" and "how important is resolution" may very well be application-dependent.

jan-ivar commented 3 years ago

Can we agree characterizations like "workaround" are flawed because they assume a simple solution exists to thousands of app-makers perfectly expressing their desired tradeoffs with regard to thousands of different cameras they've not tested?

The inherent problem here is 3000 app developers not knowing (or often not even considering) the varying capabilities of 1000 different cameras ahead of time. And I don't mean ahead of calling getUserMedia, I mean ahead of writing the code. Thus, their challenge is writing a robust policy for this moon rover ahead of knowledge.

The fallacy to me of presets as a mental model for this complicated problem is that it somehow transports the app developer to after the point of knowledge. But that's not true. Only their policy algorithm will be there, alone with each end-user. If that policy is to ask the user, then presets have a leg up as I concede above, otherwise, presets are just advanced in disguise.

We can argue whether the declarative syntax of advanced is better or worse than imperative code, but unlike presets, advanced is resolved at getUserMedia time, and are already well-defined and implemented today.

Over the years, we've gone through cycles of writing this API for perfectly rational apps (advanced) to let us help you (fitness distance), and now we seem to be cycling back. Are we seeing a lot of advanced use? People say they're complicated, but it's really the inherent problem put in its totality in the lap of the app that is complicated. I'm not convinced presets will be any simpler.

Solution A is an example of how people have to find workarounds to use this API. The app will have to check getSettings and sometimes call applyConstraints,

Getting the desired mode upfront with minimal JS more often seems the opposite of a workaround, when a large percentage of apps are never going to follow up with additional steps. If I extrapolate, it also sounds like "having to check getSettings", and presets, "and sometimes calling applyConstraints" is also a "workaround"?

No-one's disputed my claim above that the only benefit of presets is to show them to users and let them decide.

especially if adding more constraints like facingMode.

Also henbos: I think the app even being allowed to filter out web cams based on their capability is a mistake

facingMode is an inherent constraint that has no other purpose. It's only effective in getUserMedia. Solution A:

await getUserMedia({video: {width: 912, height: 660, resizeMode: "none", facingMode: "user"}});

or for more control:

try {
  await getUserMedia({video: {width: 912, height: 660, resizeMode: "none", facingMode: {exact: "user"}}});
} catch (e) {
  if (e.name != "OverconstrainedError") throw e;
  await getUserMedia({video: {width: 912, height: 660, resizeMode: "none"}});
}

Solution B does not seem to work in Chrome, which is the only browser supporting resizeMode AFAIK.

resizeMode: "none" is what Firefox implements, so it should work there.

Solution C can disrupt user-chooses.

No it can't, because OverconstrainedError happens before prompt.

youennf commented 3 years ago

Can we agree characterizations like "workaround" are flawed because they assume a simple solution exists to thousands of app-makers perfectly expressing their desired tradeoffs with regard to thousands of different cameras they've not tested?

I would look at how something is implemented in native apps and how it is implemented in browsers. If it is more complex to implement in a browser for no good reason, I would say we made a design mistake.

The inherent problem here is 3000 app developers not knowing (or often not even considering) the varying capabilities of 1000 different cameras ahead of time. And I don't mean ahead of calling getUserMedia, I mean ahead of writing the code. Thus, their challenge is writing a robust policy for this moon rover ahead of knowledge.

This is a solved problem for native apps. OS expose device capabilities, native apps implement their strategy based on that. The current getUserMedia model does not provide the same level of information and it is really unclear why. AFAIK, there is no privacy/security justification to that. Getting closer to OSes has real benefits as I mentioned above (simplicity for browser implementors, simplicity for web developers). I am not clear whether you agree with that assessment or not.

We can argue whether the declarative syntax of advanced is better or worse than imperative code, but unlike presets, advanced is resolved at getUserMedia time, and are already well-defined and implemented today.

I would not say that they are well-defined and implemented today, maybe tomorrow? Chrome and Firefox differ on the results, based on the example you gave above for getUserMedia. I haven't tested in Safari, but I would not be surprised this differs as well.

What is the benefit of having advanced be resolved at getUserMedia time?

No-one's disputed my claim above that the only benefit of presets is to show them to users and let them decide.

I missed your claim and I am disputing it :0) With presets, an app developer can reuse existing strategies put in place in native applications. With presets, an app developer can pass constraints that are guaranteed to apply cleanly, without change. No more code needed to handle rejection or code that validates that the settings are the ones that are expected. Given OS support, I am confident in the ability to get interoperability in the use of presets across browsers. I am not confident we will get to that same level of interoperability with the current model.

Do you agree with that assessment?

Can you clarify how you see the current model superior to presets? Or is it that you see the benefits of presets not big enough to warrant adding them to getUserMedia spec?

Solution B does not seem to work in Chrome, which is the only browser supporting resizeMode AFAIK.

resizeMode: "none" is what Firefox implements, so it should work there.

Solution C can disrupt user-chooses.

No it can't, because OverconstrainedError happens before prompt.

Mandatory constraints are disruptive to user-chooses. The fact that mandatory constraints may prohibit users to select some cameras they know they have has been identified as a key problem for user-chooses.

jan-ivar commented 3 years ago

Chrome and Firefox differ on the results, based on the example you gave above for getUserMedia.

Sorry I should have tested in Chrome before I posted. I forgot it needs {resizeMode: "none"} which is correct. Works now (I've updated my OP):

Solution B (if you wanted more precise control, e.g. to avoid 1024x768 for some reason):

await getUserMedia({video: {width: 800, height: 600, advanced: [
  {resizeMode: "none"},
  {width: 800, height: 600},
  {width: 1280, height: 720},
  {width: 1280, height: 1024},
  {width: 640, height: 480},
]}});
youennf commented 3 years ago

Chrome and Firefox differ on the results, based on the example you gave above for getUserMedia.

https://jsfiddle.net/63wv1euh/ on my iMac Pro shows interesting though diverging results between Chrome and Firefox. AIUI, both Chrome and Firefox are implementing the spec. But I think we should not derail the course of the discussion here.

https://github.com/w3c/mediacapture-extensions/issues/12#issuecomment-748868168 and https://github.com/w3c/mediacapture-extensions/issues/12#issuecomment-749159545 contain some questions. Henrik expressed his views on most of these questions. I would be interested in your views as well.

jan-ivar commented 3 years ago

https://jsfiddle.net/63wv1euh/ on my iMac Pro shows interesting though diverging results between Chrome and Firefox.

You have two getUserMedia calls racing. Once I serialize them (which also avoids bug 1286945), Chrome and Firefox give me the same results: https://jsfiddle.net/jib1/xbo8qs43/

Will answer the other questions later.

jan-ivar commented 3 years ago

The current getUserMedia model does not provide the same level of information and it is really unclear why.

I'd argue it does provide the same information, modulo https://github.com/w3c/mediacapture-main/issues/762 which I consider a bug. This is a plus from a privacy perspective. My remaining answers assume we fix https://github.com/w3c/mediacapture-main/issues/762.

As to why the spec is the way it is, I recall the WG felt applyConstraints was expressive enough, and that consistency with getUserMedia was important, because we wanted apps to learn constraints and provide them upfront. What you're proposing here seems inspired by a competing model. I'm not saying the existing model is fantastic, and I'm not opposed to (yet) another model if it's really good, but if it's only marginally better, then I am, because fewer models are better.

I understand you say that presets are nice-to-have with the assumption of further updating a little bit the spec. Is that correct?

It's "not a must have", is what I've tried to show.

Do you think presets would simplify developers life?

I'm not convinced, but am open to be. Do you have code examples showing the whole code? Then we can compare against using advanced.

Do you think presets would add much complexity to the spec? Or to browser implementations?

Yes.

Do you think presets might further help if we add a pixelFormat constraint (which will probably be a string constraint, thus further competing with resizeMode, facingMode...)?

Seems orthogonal.

... That does not mean the app actually wants to start capturing HD right away.

Thanks for those use cases, but they don't seem prohibitive today: App opens the camera in HD, then uses applyConstraints on it before attaching it to a video element, and the user will never know the difference. After user picks camera once, the app can set non-HD resolution directly on all revisits from now using deviceId.

Mandatory constraints are disruptive to user-chooses. The fact that mandatory constraints may prohibit users to select some cameras they know they have has been identified as a key problem for user-chooses.

We know about this in Firefox. E.g. Google Meet uses deviceId: {exact: id} which causes a prompt for a single camera on revisit. But presets and device selection seem better discussed in separate issues maybe? This thread is getting long.

What is the benefit of having advanced be resolved at getUserMedia time?

We avoid having to define potentially complex new rules for delaying camera-init.

With presets, an app developer can reuse existing strategies put in place in native applications.

"Native applications" tends to mean mobile these days, and apps can enumerate cameras. Unlike Henrik, I'm not convinced we should drop device-selection by constraints. I think facingMode, focalLength and infrared are going to (continue to) be valuable on mobile to pick the right camera off the bat. So web apps are going to have to learn constraints for getUserMedia anyway, and the value of being more like native only on applyConstraints seems less than if that weren't the case.

With presets, an app developer can pass constraints that are guaranteed to apply cleanly, without change. No more code needed to handle rejection or code that validates that the settings are the ones that are expected.

Things can always fail, so always check your errors. But yes these invariants sound appealing, at least in isolation. It's just a question of how much dirt we'd need to move to make this happen, and whether it would be worth it in the end, compared to what's already been built and works (modulo some bug fixes).

Given OS support, I am confident in the ability to get interoperability in the use of presets across browsers. I am not confident we will get to that same level of interoperability with the current model.

Do you agree with that assessment?

I don't think we should gate interoperability on presets. Will writing JS code against presets be easier than against the existing API? Perhaps. Show me.

youennf commented 3 years ago

Thanks for your precise input, this is greatly appreciated. I agree with some of your comments below and agree implementation validation at least is needed.

The current getUserMedia model does not provide the same level of information and it is really unclear why.

I'd argue it does provide the same information, modulo w3c/mediacapture-main#762 which I consider a bug. This is a plus from a privacy perspective. My remaining answers assume we fix w3c/mediacapture-main#762.

Can you explain why is it a plus from privacy perspective? A web application can get the list of preset information, but only asynchronously and through multiple camera setups which might tend be a long process.

  • The "allow some (1 or N?) applyConstraints before camera-init", would need to be defined and adds complexity.

Camera init would happen right after the micro task used to resolve getUserMedia. User Agents already support muting/unmuting, a similar code path can be reused. I haven't implemented though, I can take a look at doing so.

  • What if applyConstraints fails? How many times is it allowed to fail/succeed? What happens after?

applyConstraints fails asynchronously, what happens after is not important.

  • Can the delay be exploited to not have light come on right away (or combined with enabled = false, at all)?

Ah good point. There might be indeed a complexity for User Agents that remove the light with enabled = false. Doing a quick test, I do not see Chrome and Firefox implementing it though.

I would reverse the question though. How enabled = false is currently working with applyConstraints if device is stopped.

  • Performance - getUserMedia streams are already slow to start. Delaying load algorithms further is not appealing.

The delay is limited to executing the micro task. I can make a measurement but this is a matter of milliseconds. This does not delay attaching the stream to a video element and/or peer connection.

  • What if JS immediately attaches the stream to a sink before applyConstraints has happened?

This would be similar to what happens today, the track will anyway take some time to produce frames.

  • It's been my impression that behavioral call-site coupling like this is frowned on (method works differently based on task it's invoked from).

I am not sure to understand the issue there. Which method is supposed to work differently?

  • The presets seem limited to width, height, frameRate, which if applied as-is may override other already-applied constraints if the app is not careful (a variant of an existing API problem).

We might add pixelFormat in the future. I am not sure which constraints you are talking about from mediacapture-main, facingMode or deviceId are usually for device selection. Maybe you are referring to constraints from mediacapture-image? In that case, right, this is an existing problem that developers have to learn, it would be good to improve the API there.

  • WebIDL: I don't think a sequence be an attribute. Maybe FrozenArray?

There are examples either way but a frozen array is probably more accurate.

Do you think presets might further help if we add a pixelFormat constraint (which will probably be a string constraint, thus further competing with resizeMode, facingMode...)?

Seems orthogonal.

Not really. If you craft a constraint with resizeMode and facingMode, and there is no camera that fit both, you can end up with equal fitness distances for several devices and/or several configurations. For instance, you can either get a 800x600 resolution in environment mode or 1024x768 in user mode. I do not see a solution here except user-chooses. The more we add constraints like that, the more frequent it does happen.

... That does not mean the app actually wants to start capturing HD right away.

Thanks for those use cases, but they don't seem prohibitive today: App opens the camera in HD, then uses applyConstraints on it before attaching it to a video element, and the user will never know the difference. After user picks camera once, the app can set non-HD resolution directly on all revisits from now using deviceId.

As stated earlier, one goal is to avoid recalibrating the camera multiple times, as it may take a lot of time, might switch on/off the green light... That is why for instance, this API is synchronous and getting the same information from the camera itself using resizeMode=none is not great.

Unlike Henrik, I'm not convinced we should drop device-selection by constraints. I think facingMode, focalLength and infrared are going to (continue to) be valuable on mobile to pick the right camera off the bat

Henrik can speak for himself. My general position here is that mandatory constraints are not good for getUserMedia with user-chooses. But ideal constraints are good for getUserMedia as they are hints but do not restrict user final choice. On the other hand, exact constraints for applyConstraints seem like the right approach: either apply cleanly the configuration or fail quickly without triggering a new device setup.

What is the benefit of having advanced be resolved at getUserMedia time?

We avoid having to define potentially complex new rules for delaying camera-init.

OK, this is fair. I think this is one important item to validate: how easy it is to implement this strategy and how easy it is to specify it. This is ok to mark it as a prerequisite to exposing presets.

jan-ivar commented 3 years ago

Can you explain why is it a plus from privacy perspective?

I meant a plus for presets that they wouldn't expose more info than today (though as you point out, they'd make exposure simpler and more practical/performant).

Camera init would happen right after the micro task used to resolve getUserMedia.

That's too soon, as it would break adapter.js or any kind of shimming or composition of getUserMedia as benign as e.g.:

async function getUserMedia(constraints) {
  const stream = await navigator.mediaDevices.getUserMedia(constraints);
  doSomeValidation(stream);
  return stream;
}

const stream = await getUserMedia({video: true});
const [track] = stream.getTracks();
await track.applyConstraints(track.presets[0]); // too late, we're on a later microtask than gUM resolved on 

Microtask intra-order is also poorly specified. But applyConstraints still happens on the same task as gUM resolving, so s/micro task/task/ might work.

But even this might break shimming or composition with anything that takes another task to complete, e.g.:

const [video, audio] = await Promise.all([
  navigator.mediaDevices.getUserMedia({video: true}),
  navigator.mediaDevices.getUserMedia({audio: true})
]);
const [track] = video.getTracks();
await track.applyConstraints(track.presets[0]); // on a later task than gUM resolved on if video resolved first 

It makes gUM a timing-sensitive callback whereas before it wasn't. Tight coupling requirements like this are a form of "action at a distance" that can lead to surprises.

This is feedback I got from an AutoClosingPromise idea I had years ago.

It wouldn't be our first highly timing-sensitive API (setLocalDescription and setRemoteDescription come to mind), but it's a detractor in my view of this proposal. We'd have to work out how to contain JS observable side-effects at least, and maybe even allow user agents some room to prevent JS libraries from inadvertently causing camera glitches.

jan-ivar commented 3 years ago

you can end up with equal fitness distances

That's what advanced is for. It lets JS declare criteria in order, so it doesn't have this problem. What does it not satisfy?