w3c / mediacapture-main

Media Capture and Streams specification (aka getUserMedia)
https://w3c.github.io/mediacapture-main/
Other
121 stars 61 forks source link

Why does `navigator.mediaDevices.enumerateDevices()` require that `Document` must have active keyboard focus? #905

Open juj opened 1 year ago

juj commented 1 year ago

At Unity we are implementing support for Unity web exported content on mobile devices, and one part of that work is exposing access for Unity projects to utilize the webcam and other audio capture devices.

Recently we have gotten reports about issues, where on Firefox, the Unity page load will not progress on the background, but users must have the Unity WebGL game tab active on the foreground for loading to proceed. This is reported only to occur on Firefox.

Further investigating into the issue, the problem is due to the navigator.mediaDevices.enumerateDevices() check, which Unity performs at page loading stage. This check populates initial webcam and microphone information to the Unity C# project code to access, and only after it completes, will the main C# content start.

The reason for implementing a device enumeration gate to the Unity content loading progress is that after the loading has finished, Unity C# code may be initializing 3D scene data based on the existence availability of a webcam or a microphone.

(Actually starting a webcam/microphone access in Unity is still an asynchronous operation)

However simply querying the set of available devices in Unity is designed to be a synchronous operation. A Unity C# project can potentially access the webcam info immediately at project startup, hence why we gate the actual content startup to run an enumerateDevices() step.

The web spec states that the enumerateDevices() operation does not require a user permissions check. Only starting a device, and getting detailed device info does - which does sound OK for our needs.

However, for some odd reason, it has been specced that the Promise returned by navigator.mediaDevices.enumerateDevices() should stay lingering until the Document object of the calling JS scripting context has acquired keyboard focus. See [1] and [2].

What this means is that JS page content which wishes to simply enumerate devices, without necessarily the intent of activating any of them, will not be able to make progress if the page is on the background. As a result, we find ourselves implementing clunky timeout watchdog timers to check if the enumeration has hung and it will be a waste of time to wait for it to resolve.

Such behavior is not ideal, since realizing that the enumeration will likely "never" come (as long as the user is not coming to wake the page up) will take some time as well, and precious startup loading time will have been wasted. The result is that the Unity content will not be able to complete the 3D scene load noninteractively, that may be dependent on the webcam availability.

May I inquire as to what was the rationale in requiring the Document to have keyboard focus until device enumeration Promise is allowed to resolve? For what it's worth, it does seem like Firefox, Chrome and Safari are implementing this check differently, and only Firefox does actually require for that to be true. (see [3] and [4])

Would it be possible to actually consider removing that requirement? To my understanding that requirement is not serving a security related benefit, since the information that is returned is already non-identifying? (only after acquiring a permission for a device using the Permissions API one will get detailed HW info). Or am I misguided here?

Or if removing the requirement is not at all possible to even consider, would it be possible to be able to perform an enumeration query that would be able to immediately reject the Promise if "now is not the time to allow doing this type of query", so that these types of watchdog timers would not be needed, and JS page load would be able to proceed quickly, without needing to resort to implementing a clumsy watchdog timer?

That way JS pages would not be left hanging, and they could decide to do something else with the precious loading time.

Thanks for considering!

alvestrand commented 1 year ago

The previous discussion of this issue was in #560 and #561, I thik.

juj commented 1 year ago

The previous discussion states e.g. Firefox has this behavior as well. Turning on camera/mic from a background tab is creepy.

However that does not seem to be accurate in the scope of the question of this ticket: definitely agree that camera/mic should only be possible to turn on from the currently visible tab.

Though navigator.mediaDevices.enumerateDevices() does not turn on a device, it just returns information. Further, the information is already anonymized to avoid finger-printing beyond a binary "does a webcam exist or not" information (until user gives permission via the API), iiuc.

So after reading the above tickets, the same questions do still remain:

jan-ivar commented 1 year ago

Short answer: trackers. enumerateDevices is still called by 8% of pages, dwarfing getUserMedia at 0.6%. It's why the spec has significantly reduced fingerprinting down to 2 bits ahead actual camera/mic use, but not all browsers have caught up yet (crbug 1101860, bug 1528042). Once they do, it's still 2 bits, so my bet is tracking libraries will continue to call it.

Additionally, users unplugging or inserting a USB device may be time-correlated to uniquely identify them across origins, even if browsers time-fuzz (they don't) the devicechange event that fires then. You'll find the steps that fire the devicechange event contains similar "focus" language for that reason. Since enumerateDevices can be called in a loop until its result differs to emulate the devicechange event, it makes sense for it to have the same restriction.

navigator.mediaDevices.enumerateDevices({rejectIfUnavailable: true});

It might be simpler to do this:

if (document.visibilityState == "visible") {
  await navigator.mediaDevices.enumerateDevices();
}

Longer answer: one might think document.hasFocus() should be used, but that would require focus of iframes. See https://github.com/w3c/mediacapture-main/issues/752 for the complicated reasons.

That way JS pages would not be left hanging

Promises have replaced callbacks, but they're still just a mechanism to trigger callbacks. As such they don't interfere with garbage collection, and no resources are "left hanging" in the traditional sense just because callbacks aren't called. async/await is syntactic sugar (sweet sugar but sugar nonetheless). I hope this answers your questions.

juj commented 1 year ago

As such they don't interfere with garbage collection, and no resources are "left hanging"

Sorry, maybe some confusion. I was not referring to any garbage collector dependency here, but to the general loading flow being paused/stalled from progressing on the background.

It might be simpler to do this:

This is something I did consider at first, but I would object that is a bad and brittle design pattern to implement in user code, since it creeps in/duplicates "magic" logic from the spec, and there is then a bad race condition bug that can happen if the user immediately navigates to another page while the promise is processing. (so the JS code check would pass, but from browser POV visibility was no longer there)

That is why the request for a callback form that would actually reject if "now is not a good time". That way developers could actually write logic where they could say their intent to the API.

alvestrand commented 1 year ago

Repeating comment from #903: We might want to see if we can get UMA on this to figure out how big breakage might be.

jan-ivar commented 1 year ago

We might want to see if we can get UMA on this to figure out how big breakage might be.

Any breakage would be sites already not working in Firefox. Not saying this is zero, just that it's likely not a staggering number, or we would have heard about it.

jan-ivar commented 1 year ago

there is then a bad race condition bug that can happen if the user immediately navigates to another page while the promise is processing. (so the JS code check would pass, but from browser POV visibility was no longer there)

Hmm, I think there's a bug in the enumerateDevices algorithm here actually: it's referencing the Document in parallel, which is a no-no. I think it needs to move up ahead of the in-parallel steps, which is what Firefox does. I'll do a PR. cc @karlt

With that fixed, there is no race possible, because the check is done synchronously.

I think the remaining issue is resolving https://github.com/w3c/mediacapture-main/issues/752#issuecomment-1293797299.

... that is a bad and brittle design pattern to implement in user code, since it creeps in/duplicates "magic" logic from the spec

I can't speak for everyone, but I think if we can make this work:

if (document.visibilityState == "visible") {
  await navigator.mediaDevices.enumerateDevices();
}

...then I'd be inclined to say the value of adding the following new API is negative:

try {
  navigator.mediaDevices.enumerateDevices({rejectIfUnavailable: true});
} catch (e) {
}

I say negative, because with web compat so poor right now, I worry exposing such a method would let apps opportunistically wait for focus only on browsers that require it. — Instead, I'd prefer for other user agents to catch up here.

juj commented 1 year ago

if we can make this work:

With that fixed, there is no race possible, because the check is done synchronously.

Thanks! I would recommend adding an explicit guideline note in the spec to hint implementors to realize that this synchronicity is explicitly depended on - so that no implementations will be doing other things under the "as-if" rule.

That is because there are likely no unit test suites that can be written to verify that a browser works explicitly under that API contract(?)

We are cautious to start relying on the above, just to realize that we'd then get a small % of users hanging the page loads on background in practice when they open a page and then immediately navigate away, if some browser is doing something else under the "as-if" rule.

jan-ivar commented 1 year ago

Actually referencing a document off main-thread would be a potential security bug that should get caught in review, so this was more of a spec-writing bug I think (e.g. we didn't trip over this in Firefox).

We wrote some Mozilla-specific tests last year, but once #752 is resolved, it might be possible to write a test that queues tasks to call enumerateDevices repeatedly until a visibilitychange caused by window.open(), then checks that all collected promises resolve within the test timespan. This should fail intermittently at least if there's a non-compliant browser.

We are cautious to start relying on the above, just to realize that we'd then get a small % of users hanging the page loads on background in practice when they open a page and then immediately navigate away, if some browser is doing something else under the "as-if" rule.

"as-if" would in practice require an interpretation of the spec that puts this check after the actual (time-consuming) enumeration step, I think, which would be a clear violation. But more immediately, you're right since, as mentioned in https://github.com/w3c/mediacapture-main/issues/752#issuecomment-1293797299 Firefox right now also requires system focus, which is hard to check for in JS. So for now I'd recommend:

const wait = ms => new Promise(r => setTimeout(r, ms));

if (document.visibilityState == "visible") {
  await Promise.race([navigator.mediaDevices.enumerateDevices(), wait(2000)]);
}

Not to be taken literally on pageload though, since await navigator.mediaDevices.enumerateDevices() may take a second to complete, involving waiting on IPC from the browser's main process, so a simple await would be a missed opportunity to do other stuff in the meantime, if you're looking to speed up pageload.

karlt commented 1 year ago

The problem highlighted here is not so much that the async operation does not resolve immediately, but that the useful information about types of devices present is not necessarily available when required.

If the promise hasn't resolved at the time that the information is desired, then an application could proceed based on a guess of whether devices are present and adjust later in a similar way to how it might adjustment on a devicechange event. However, the problem highlighted here is that the wrong UI would be visible until the promise resolves.

There are some reasons to not make this device information available without a permission gate. If the information is going to be made available, however, then ideally it would be available whenever it might be useful. In this regard, making the information available on page visibility would be preferable, if the fingerprinting exposure with a visibility gate is comparable to that with a focus gate.

jan-ivar commented 1 year ago

With https://github.com/w3c/mediacapture-main/pull/912 merged, enumerateDevices() no longer requires keyboard focus, and the visibility requirement can be ensured synchronously ahead of calling it, as shown in https://github.com/w3c/mediacapture-main/issues/905#issuecomment-1295281993. Leaving this open until tests mentioned there have been added.