w3c / mediacapture-main

Media Capture and Streams specification (aka getUserMedia)
https://w3c.github.io/mediacapture-main/
Other
121 stars 61 forks source link

getUserMedia "hanging" indefinitely #846

Open eladalon1983 opened 2 years ago

eladalon1983 commented 2 years ago

From the getUserMedia() algorithm, step 6.1 as of the time of this writing:

The User Agent MUST wait to proceed to the next step until the relevant settings object's responsible document is fully active and has focus.

Documents might remain inactive for very long times. It is arguable that the Promise resolving after a week might be undesirable.

On the other hand, it's hard to choose sensible limits.

Maybe it could be mentioned that the user agent MAY reject the comment if the conditions (fully active and has focus) don't materialize for an "unreasonable time" or something similar?

youennf commented 2 years ago

The decision to reject can be done today based on a timer. The only thing is when the promise gets actually settled: either after a timer when the page remains in background or when the page gets active. I am unsure what the benefit of rejecting sooner (aka triggering some JS execution in a background page) might be. One additional downside is allowing different behaviours across browsers.

eladalon1983 commented 2 years ago

The decision to reject can be done today based on a timer.

Yes, that's my thinking.

The only thing is when the promise gets actually settled: either after a timer when the page remains in background or when the page gets active.

By rejecting while the page is in the background we:

Admittedly, these are but minor benefits. Are there benefits to the inverse approach?

I am unsure what the benefit of rejecting sooner (aka triggering some JS execution in a background page) might be.

Prevents unintended and surprising behavior which neither user nor application desire, like a previously inactive page asking for mic/camera permissions as soon as the user activates that page, a very long time after they had originally interacted with the page. For example, consider coming back from vacation on January and activating a tab for the first time in weeks.

One additional downside is allowing different behaviours across browsers.

Indeed. This can be mitigated by mandating a minimum value for "reasonable time."

youennf commented 2 years ago

By rejecting while the page is in the background we:

  • Clear some minuscule amount of memory sooner.
  • Alert applications sooner to the fact that they will not, in fact, be getting mic/camera access.

Admittedly, these are but minor benefits. Are there benefits to the inverse approach?

I see two advantages:

Also, I do not see why we should be gentle to applications calling getUserMedia on backgrounded pages. Applications calling getUserMedia based on a user gesture should hopefully not end up into that issue.

Prevents unintended and surprising behavior which neither user nor application desire, like a previously inactive page asking for mic/camera permissions as soon as the user activates that page, a very long time after they had originally interacted with the page. For example, consider coming back from vacation on January and activating a tab for the first time in weeks.

There are two different things:

  1. rejecting the getUserMedia request: this can be done today based on a timer/heuristic without changing the spec.
  2. rejecting the getUserMedia promise: this can only be done once the page gets focus as per spec.

From what I see, 1 handles these concerns. It seems worth pointing out the issue you identified to help implementors, for instance as a note directly in the spec algorithm.

eladalon1983 commented 2 years ago

rejecting the getUserMedia request: this can be done today based on a timer/heuristic without changing the spec.

(Reordered to put this most important part first.) What part of the spec allows that? To be clear, it is precisely this affordance that I suggest adding to the spec. If it's already there - great!

Consistent cross-browser behavior

Rejecting when the page regains activity+focus does not provide full consistency if one browser rejects after 30min of inactivity and another browser rejects after 60min. (Let alone if another browser never rejects.)

Also, I do not see why we should be gentle to applications calling getUserMedia on backgrounded pages. Applications calling getUserMedia based on a user gesture should hopefully not end up into that issue.

The prose explaining focus is not terribly easy to understand, so I might easily be wrong. But as I understand, a page does not have to be backgrounded in order to not have focus. Consider side-by-side windows, like FaceTime and Safari. Let Safari have but a single tab. If the user clicks on the FaceTime window, then no document in the single tab Safari is displaying is focused, I believe...? Assuming I'm right about that - what happens if the user alt-tabs back to Safari the next day, or after lunch? I think that's an unreasonable time, and the browser should be allowed (MAY) to reject the Promise based on a timer.

youennf commented 2 years ago

What part of the spec allows that?

Step 6.3.7 states that: based on a previously-established user preference, for security reasons, or due to platform limitations, jump to the step labeled Permission Failure below. It is true that the promise could theoretically be rejected before executing that step. The UA being in control of capture devices, it seems fine as is to me. But we could move 6.3.7 up, or clone this step just after step 6.1.

I also wonder what your thoughts with step 6.5.2 are. My understanding is that UAs can deny the request on behalf of the user after some time. In that case, when page gets back focus, the UA does as if user denied the prompt.

jan-ivar commented 2 years ago

It is arguable that the Promise resolving after a week might be undesirable.

But is it arguable that the Promise resolving after a week is never desirable? That's what imposing a deadline in the spec would mean. Seems like UA territory to me that we should allow but not standardize (not a web compat issue).

But we could move 6.3.7 up, or clone this step just after step 6.1.

The user might have multiple background tabs with pending gUM and this would reject all of them at the exact same time, which could be time-correlated in an exploit to track the user across origins.

Not rejecting a promise isn't "hanging", it's more like not firing an event or never calling a callback, which seems fine.

I also wonder what your thoughts with step 6.5.2 are.

If after weeks of vacation I activate a tab that had an unanswered permission prompt before I left, I think I'd expect to find it as I left it. Seems harmless (call this situation A).

If after weeks of vacation I activate a tab and it immediately turns on camera or microphone, because I've trusted the page with persistent camera or mic permission, that might be surprising (situation B).

But B isn't a security issue, since a malicious site with such permission could already request gUM on visibilitychange. So the remaining question seems to be how to protect users from accidental privacy invasion by well-meaning apps inadvertently inferring user intent to start capture at that time.

Situation A seems easy for a user to get into (tabbing away without answering a prompt), but B seems a bit harder. The site would basically have to request gUM while in the background (without a user trigger), or the user tabbed away in the sometimes >1 second time window between gUM call and success.

My understanding is that UAs can deny the request on behalf of the user after some time. In that case, when page gets back focus, the UA does as if user denied the prompt.

This matches my understanding, and seems sufficient, so I suggest we close this.

youennf commented 2 years ago

But we could move 6.3.7 up, or clone this step just after step 6.1.

The user might have multiple background tabs with pending gUM and this would reject all of them at the exact same time, which could be time-correlated in an exploit to track the user across origins.

I do not think this can be correlated since only one tab can have focus at a given time. That is why I'd like to keep this step after the "wait for focus" step.

This matches my understanding, and seems sufficient, so I suggest we close this.

As I said previously, I am ok with adding a note calling explicitly that issue to implementors

eladalon1983 commented 2 years ago

Step 6.3.7 states that: based on a previously-established user preference, for security reasons, or due to platform limitations, jump to the step labeled Permission Failure below.

Thanks. You're right that, modulo not resolving the Promise until regaining focus, this part of the spec allows the UA to reject on the user's behalf. If we decide that waiting until focus is regained is the right thing, then this is indeed sufficient.

I am unsure what the benefit of rejecting sooner (aka triggering some JS execution in a background page) might be.

Otherwise, application cannot tell if the Promise is alive and well (might be fulfilled in the future) or a "zombie" (pending but could never be fulfilled; might still be rejected, though). If this is an important issue for the application, it might try to determine that by checking the browser name+version against a list of known expiration times - awkward. But if all UAs reject when+if they determine it's been too long, then there is no work required by the application, and also compat issue. (This assertion of no compat issue stands even if some UAs never impose a limit.)

Not rejecting a promise isn't "hanging", it's more like not firing an event or never calling a callback, which seems fine.

I agree. That's why I used Dr. Evil's air quotes around "hanging." 😉

but B seems a bit harder. The site would basically have to request gUM while in the background (without a user trigger), or the user tabbed away in the sometimes >1 second time window between gUM call and success.

  1. That does happen.
  2. Sites sometimes spawn new tabs. If the user leaves the computer before tabbing back, they enter this state.
  3. An application might call both getUserMedia() and getDisplayMedia() in response to a single click ("join and present"), either immediately or sequentially. Calls to getDisplayMedia() can result in activation of the tab the user chooses to share.

By the way, even if the application is quite sophisticated and user-friendly, and decides to monitor how long it took for the UA to approve gUM and immediately kill the capture if it's been too long, the user might still be alarmed that their camera's indicator light flashed briefly.

jan-ivar commented 2 years ago

That is why I'd like to keep this step after the "wait for focus" step.

@youennf Oh I see, you said after step 6.1, not before. I misunderstood your reason for moving 6.3.7 up: it would still wait for focus, and merely affect which error is thrown. I think I prefer leaving it where it is, since the other errors (NotFoundError, OverconstrainedError) are all more informative and not solvable by merely prodding the user or repeating the gUM call with the same arguments.

If this is an important issue for the application

@eladalon1983 I don't think I understand what this issue is or what "list of known expiration times" would be. There's nothing to do if the promise never resolves, and I don't see what's UA-specific about apps implementing impatience. Can you clarify?

That does happen.

Yes, I didn't mean to suggest otherwise, and I agree user agents should be allowed to make the call for users in such cases, but to protect the user, not for reasons of any expectation by the application.

youennf commented 2 years ago

since the other errors (NotFoundError, OverconstrainedError) are all more informative

That might be a problem. Say a privacy-aware browser does not want to reject with those exceptions, it cannot really do that. Adding a reject step before those checks seem useful: if user denies camera access forever to a given website, user might not want to let the website that there is a camera or not.

Similarly, we are doing the permission policy checks after checking devices which seems wrong. I'll file a separate issue for that.

it might try to determine that by checking the browser name+version against a list of known expiration times

If we keep reject once focused, the application does not have to do browser specific checks. It just has to wait for focus to happen, wait for promise to either be rejected a very short amount of time after it is back to focus. This allows to determine at the time the application should actually call getUserMedia whether it should or not. As an example, say user focuses the page and clicks very quickly on the 'capture' button. In that case, maybe the click event handler could be called before the getUserMedia promise is resolved. This seems like an edge case that browsers could probably make sure to implement well without changing the spec ( a user click would first trigger focus, the promise would get rejected, then user would click on the button which would call the button event handler).

youennf commented 2 years ago

I'll file a separate issue for that.

https://github.com/w3c/mediacapture-main/issues/847

eladalon1983 commented 2 years ago

The application cannot currently cancel a call for gUM once it's in the pending state. That means that a call to gUM can result, several days later, in a user seeing the indicator turn on. Even if the application is wise enough to realize that it's been too long, and even if it immediately stops the capture, some damage has already been done - the user will be alarmed, and might even suspect the honest application. Preventing this would protect both user and application.

We've glossed over one other possible solution, and discussed two possible solutions.

  1. [Not yet discussed] Allow applications to cancel gUM calls. I think this is a tall order, and is a partial solution to boot.
  2. [Discussed] Allow the UA to reject Promises once focus shifts back.
  3. [Discussed] Allow the UA to reject Promises in the background.

Youenn has pointed out that no2 is already spec-compliant. I am trying to argue that no3 is better for both applications and for compat, because it makes an implicit state explicit. Consider no2. User agents will likely "zombify" gUM-Promises after an internal timer elapses, meaning UAs will make the irrevocable decision to reject the Promise as soon as the page regains focus. For argument's sake, say Chrome internally decides to reject the Promise after 30min, Firefox after 60min, and Safari never. Suppose an application cares about when a Promise is zombified. Such an application would hard-code the knowledge that Chrome/Firefox/Safari would never fulfill the Promise after 30/60/infinity minutes, respectively. This is awkward. But if the UA rejects the Promise in the background (instead of zombifying it), this would allow applications to write code that's agnostic of which UA it's running on.

youennf commented 2 years ago

Suppose an application cares about when a Promise is zombified.

Can you clarify why applications would actually care about this?

eladalon1983 commented 2 years ago

Suppose an application cares about when a Promise is zombified.

Can you clarify why applications would actually care about this?

Consider an application that records an employee, either for compliance reasons, for quality assurance, or for billing. The application issues local and/or remote alerts if unable to record for N minutes. It helps to have a dedicated error for the Promise timing out while backgrounded, so that the local user could be appropriately informed of what went wrong. ("You have already approved mic+camera permissions, but we still need you to focus this tab to start recording.") Locally presented information would still be presented only on-focus-regain, so rejecting in the background is not strictly necessary (assuming dedicated error). For informing a remote controller, likely an application-specific timer would be employed. For telemetry and debuggability, it would be nice to also know if a UA-timer has elapsed, but I guess that's not strictly required. So maybe the minimum required change here would be to say that if the UA rejects the gUM Promise after it's in the background for too long, has to employ a dedicated error.

eladalon1983 commented 2 years ago

It's interesting to note that Chrome has recently implemented the requirement for focus, and that this has led to bug reports from Web developers who were flummoxed by this new behavior. Some of them characterized it as "getUserMedia taking minutes to complete".

youennf commented 2 years ago

So maybe the minimum required change here would be to say that if the UA rejects the gUM Promise after it's in the background for too long, has to employ a dedicated error.

Is there web developer request for this? I guess UAs can use a dedicated error message without any spec change.

eladalon1983 commented 2 years ago

I guess UAs can use a dedicated error message without any spec change.

Is it best practice for Web apps to read the error message and break if it changes? Or is it best practice to use the error type/name for that?

Is there web developer request for this?

This issue has languished for too long; I no longer remember. Assume not.

jan-ivar commented 6 months ago

If there's no web developer interest for this, can we close it, or get a concise summary of the remaining ask?

alvestrand commented 6 months ago

Elad's comments show that there is user interest in this behavior (pending gUM being resolved/rejected at surprising times), and Web developers are being pinged to "do something about it".

So in the hierarchy of constituencies, we have worried users and worried Web developers. What more interest needs to be shown?

eladalon1983 commented 3 months ago

It's a bit sad that we could not resolve this discussion in 2.5 years. The bar mentioned by both @jan-ivar and @youennf was "Web developer interest." We have that here. Shall we introduce a dedicated error now?