w3c / mediacapture-screen-share-extensions

Other
1 stars 0 forks source link

Auto-pause capture when user switches captured content #4

Open eladalon1983 opened 1 year ago

eladalon1983 commented 1 year ago

Both Chrome and Safari now allow the user to change what surface is captured.

I was too lazy for alt-text. I was too lazy for alt-text.

That's obviously great stuff. Can we make it better still?

So I propose that we add two things:

  1. A control which allows an application to instruct the browser - whenever the user changes what surface is shared, pause the track (by setting enabled to false).
  2. Fire an event whenever this happens. (With the general expectation that the application will apply new processing and then set enabled back to true.)

Possibly the two can be combined, by specifying that setting an event handler signals that the pausing behavior is desired (@alvestrand's idea).

Another natural extension of this idea is to also apply it when a captured tab undergoes cross-origin navigation of the top-level document. When that happens, some applications might wish to stop capture momentarily and (in-content) prompt the user - "do you still want to keep sharing?"

Relevant previous discussion here.

jan-ivar commented 9 months ago

To recap my view from today's editors' meeting, I see 3 things to decide on (with my preferred answers):

  1. declarative opt-in (bikeshed getDisplayMedia({audio: true, appAssistedSurfaceSwitching: "include"}))
  2. notification regardless (sourceswitch event)
  3. app decision-point (late aka point-of-use through event.preventDefault())
eladalon1983 commented 9 months ago
  1. app decision-point (late aka point-of-use through event.preventDefault())

I'd still like to see an example of an application that benefits of this possibility.

jan-ivar commented 8 months ago

I'd still like to see an example of an application that benefits of this possibility.

In today's meeting the early decision example shown was:

getDisplayMedia({appAssistedSurfaceSwitching: "include", …})
controller.onsourceswitch = event => {
  video.srcObject = event.stream;
};

But this will glitch in all browsers, even for same-type switching, because it reruns the media element load algorithm.

A late decision seems inherently needed to fix this glitch for the subset of same-type switching. E.g.

controller.onsourceswitch = event => {
  if (!compatibleTypes(video.srcObject, event.stream)) {
    event.preventDefault(); // Use switch-track model
    video.srcObject = event.stream;
  }
};

Glitching may similarly happen with other sinks, like MediaRecorder or WebAudio.

eladalon1983 commented 8 months ago

I don't fully understand what is being asserted here. A clarification would be welcome.

I also note this interesting bit: (Emphasis mine.)

A late decision seems inherently needed to fix this glitch for the subset of same-type switching.

Does that mean you support dropping the late-decision requirement for non-same-type switching?

dontcallmedom-bot commented 8 months ago

This issue was discussed in WebRTC December 12 2023 meeting – 12 December 2023 (Dynamic Switching in Captured Surfaces)

eladalon1983 commented 7 months ago

But I am willing to lean in and actually claim it. Yes, developers need the early-decision, because cross-surface-type(!) source-injection is a footgun. Consider the code in this fiddle: https://jsfiddle.net/eladalon/Ly8a3wcs/

I can now substantiate this claim in a more persuasive manner. Try out captured-surface-control.glitch.me using Chrome Beta/Canary. Observe:

  1. If you choose a window, the application stops the capture and bids you try again.
  2. If you choose a tab, the application "activates" and there is a built-in assumption that you keep on sharing a window.

Applications built before cross-surface-type source-switching was possible had no reason to expect that getSettings().displaySurface might be mutable, and they are not robust to these changes.

-- Note: Of course Captured Surface Control is not a standard API. Assume for the sake of argument that it never will be. The whole point here is to show that in the future, we could credibly add APIs that would work for one surface types but not for others, and that unexpected switching would break apps.

jan-ivar commented 6 months ago

I don't fully understand what is being asserted here. A clarification would be welcome.

It's asserting that injection and its alternative have different side-effects, and which ones an app prefers might differ based on what surface the end-user chose to switch to/from (e.g. whether both or neither have audio). E.g.

A late decision seems inherently needed to fix this glitch for the subset of same-type switching.

Does that mean you support dropping the late-decision requirement for non-same-type switching?

I've seen no proposal for how an app might specify its preferences for the different surfaces a user might pick up-front, but am happy to compare complexity of anything presented.

Applications built before cross-surface-type source-switching was possible had no reason to expect that getSettings().displaySurface might be mutable, and they are not robust to these changes.

How are they not robust to these changes? Do you have an example that is not experimental?

eladalon1983 commented 6 months ago

It's asserting that injection and its alternative have different side-effects, and which ones an app prefers might differ based on what surface the end-user chose to switch to/from (e.g. whether both or neither have audio).

Thanks, now I understand.

Theoretically speaking - I agree completely. But do we have a concrete example of such an app? Are there any apps that decide whether to use MediaRecorder vs. RTCRtpSender based on whether the user shared a window vs. a screen? I am not aware of such apps, and I'd actually be quite surprised if you could name such an app. All apps I know make the decision - when there even is a decision to be made - before invoking getDisplayMedia(). I think it's important that we solicit actual developer feedback and only introduce complexity that serves genuine needs.

I've seen no proposal for how an app might specify its preferences for the different surfaces a user might pick up-front, but am happy to compare complexity of anything presented.

I don't think that's relevant. I believe the previous paragraph of my present comment explains why.

How are they not robust to these changes? Do you have an example that is not experimental?

  1. Meet displays a scrim over the local preview of shared windows/screens, but not over the local preview of shared tabs.
  2. I believe the example of the experimental API was compelling and deserves our attention.
jan-ivar commented 5 months ago

Are there any apps that decide whether to use MediaRecorder vs. RTCRtpSender based on whether the user shared a window vs. a screen?

I think there's a misunderstanding. I gave two examples of apps that may need late decision on injection vs new tracks:

  1. a MediaRecorder example (same file vs new file)
  2. an RTCRtpSender example (immediate vs wait for renegotiation round-trip)

I was NOT suggesting a single app might choose between a MediaRecorder or an RTCRtpSender sink. I would indeed struggle to find a concrete example of that. 😉

jan-ivar commented 5 months ago

... only introduce complexity that serves genuine needs

By complexity do you mean functinality? The best API matches the complexity of the functionality exposed. We can observe the natural complexity here by separating concerns:

  1. apps want to learn when the user switches source → they register for the sourceswitch event

  2. UAs may wish to hold back UX options that might not work → they look for explicit app opt-in through getDisplayMedia({appAssistedSurfaceSwitching: "include", ...}))

  3. Downstream symptoms might dictate when injection vs. new tracks is preferable, which can differ based on what the user chose → event.preventDefault() = don't inject, I'll handle it

These are mostly orthogonal. I.e. we can imagine apps wanting 1 without 2 or 3, and UAs concern that apps own the user problem is nicely separated from the app's downstream needs, avoiding the fallacy that injection can't or won't work in many cases still.

This offers the most functionality to webpages, including already-shipped functionality (injection).

Compare this to DisplaySurfaceChangeCallback which ties 1, 2, and 3 together. I.e.

  1. apps that want to learn when the user switches source cannot do so without registering a global callback AND writing code to handle new tracks, potentially suffering downstream symptoms like glitches or separate recording files, even for sources that should have worked

Forcing apps to opt-out of all injection to opt-in to more UA switching no doubt simplifes UA code, by offering less functionality. But less functionality doesn't seem like a user win.

tovepet commented 5 months ago

The web developers I have talked to have all preferred the predictability of having a new track for each captured surface over the convenience of the injection model. I don’t think this should be relegated to a secondary use case with extra hoops to jump through.

So let’s see if we can find a way to make both the switch-track model and the injection model easy and straightforward to use, and also provide some more flexibility in how they are applied.

One option could be to provide both of these track-types in parallel:

The API could look something like this:

controller.onnewsource = event => {
  video1.srcObject = event.stream; // surface tracks
};
const sessionStream = await getDisplayMedia({controller, /*opt-in*/, ...}); 
video2.srcObject = sessionStream;

where video1 would be using the switch-track model and video2 would be using the injection model. (The onnewsource event would be sent for all new surfaces including the initial one)

This API has the following benefits:

What do you think? Could something like this better cover the different usages of the API that we have been considering?

jan-ivar commented 4 months ago

where video1 would be using the switch-track model and video2 would be using the injection model.

I like this idea of exposing both to the application and letting it use the one it prefers. It seems neutral and would let us measure over time whether apps find injection desirable, while remaining backwards compatible.

With preventDefault() I was hung up on the UA needing to stop one or the other right away, but if we don't need that then it simplifies.

My question would be what are the semantics now of calling video2.srcObject.getVideoTracks()[0].stop()? Would it also stop video1.srcObject.getVideoTracks()[0] or not (and vice versa)?

  1. If no, the ~hardware light~ UA's privacy indicator UX might stay on for several seconds after a user clicks stop (until GC) in today's apps unaware of the newsource event.
  2. If yes, we've created a new "either-or" track, where stopping one stops both, which could confuse apps.

Running with 2 for a bit, maybe we just fire ended on the other track and call it a special case?

youennf commented 4 months ago

Option 1 makes sense to me, UA will likely optimize the case of no event handler for newsource

tovepet commented 4 months ago

The way I conceptualize these options about whether stop should affect just one or both tracks is as follows:

  1. Session tracks and surface tracks behave as clones with respect to each other, so stop would only affect the track on which it is called.
  2. Session tracks are proxy tracks to the underlying surface tracks. Operations done on session tracks will also affect the corresponding underlying surface tracks and vice versa. Calling stop on one of the tracks would then affect both tracks.

If we choose to treat them as clones (option 1), I think that rather than introducing a special case, it’s better to allow the application to choose which tracks to receive through the opt-in, e.g.:

That would avoid creating the extra cloned track in the first place for applications that are only interested in either session tracks or surface tracks. It also does not add any extra burden on application writers since they would need to specify an opt-in anyway.

Option 1 makes sense to me, UA will likely optimize the case of no event handler for newsource

I don’t think this optimization would work in the other direction, i.e., for applications that are only interested in surface tracks.

jan-ivar commented 4 months ago

Note I inadvertently wrote "hardware light" among my concerns above, but of course this is screen-capture not camera/mic, so the only user-observable side-effect of an unstopped track would be the prolonged appearance of whatever privacy indicators the browser shows for a couple extra seconds until GC happens (e.g. after a user clicks stop).

Option 1 makes sense to me, UA will likely optimize the case of no event handler for newsource

I don’t think this optimization would work in the other direction, i.e., for applications that are only interested in surface tracks.

That's seems fine, as this optimization would be there to solve today's apps unaware of the newsource event.

In contrast, apps uninterested in session tracks can simply stop them once they've received new surface tracks:

const sessionStream = await getDisplayMedia({controller, /*opt-in*/, ...});
video.srcObject = sessionStream;
controller.onnewsource = ({stream}) => {
  video.srcObject.getTracks().forEach(track => track.stop());
  video.srcObject = stream; // surface tracks
};

So there doesn't seem to be much need for new stop semantics, which seems nice.

tovepet commented 4 months ago

Having to manually stop tracks is just the type of gotchas that I think we should strive hard to avoid when possible. It’s way too easy for a developer to miss, leading to lingering privacy indicators disconcerting users.

In this case the cost to fix the issue is also next to zero for applications that do not need to use both the injection and switch-track model. (I expect this to be the vast majority of applications.)

Compare:

controller.onnewsource = ({stream}) => {
  video.srcObject = stream; 
};
await getDisplayMedia({controller, surfaceSwitchingMethods: [“replace”], ...});

to

controller.onnewsource = ({stream}) => {
  video.srcObject = stream;
};
const sessionStream = await getDisplayMedia({controller, someOtherOptIn: “include”, ...});
sessionStream.getTracks().forEach(track => track.stop());

The former is both less code and less error-prone than the latter.

jan-ivar commented 4 months ago

With the optimization @youennf proposed, forgetting stop() seems like an existing problem.

Having apps explicitly stop() tracks they're done is the web model today, which makes its side-effects well-established, predictable, and pilot errors easy to diagnose and fix.

I'm not convinced introducing custom stopping-policies into the mix simplifies that responsibility.

controller.onnewsource = ({stream}) => {
  video.srcObject = stream;
};
const sessionStream = await getDisplayMedia({controller, someOtherOptIn: “include”, ...});
sessionStream.getTracks().forEach(track => track.stop());

The former is both less code and less error-prone than the latter.

Ah, I missed earlier you said the event would fire for all new surfaces "including the initial one"! Having apps immediately stop tracks from getDisplayMedia() does look weird indeed.

I like the session vs surface behaviors, but why do web developers need to pick between two types of tracks? This seems to artificially put injection off the table on subsequent switches once non-injection is chosen just once, for no apparent or inherent reason.

I'd like to propose a more fluid model where web developers doesn't need to care about this on the initial getDisplayMedia call, and every track remains a candidate for injection:

To inject everything (the UA optimizes stopping tracks surfaced in sourceswitch):

video.srcObject = await getDisplayMedia({controller, /*opt-in*/, ...});

To never inject:

video.srcObject = await getDisplayMedia({controller, /*opt-in*/, ...});
controller.onsourceswitch = ({stream}) => {
  video.srcObject.getTracks().forEach(track => track.stop()); // stop old
  video.srcObject = stream;
};

To selectively inject:

video.srcObject = await getDisplayMedia({controller, /*opt-in*/, ...});
controller.onsourceswitch = ({stream}) => {
  if (tracksAreCompatible(video.srcObject, streams)) {
    stream.getTracks().forEach(track => track.stop()); // stop new
  } else {
    video.srcObject.getTracks().forEach(track => track.stop()); // stop old
    video.srcObject = stream;
};
dontcallmedom-bot commented 4 months ago

This issue had an associated resolution in WebRTC April 23 2024 meeting – 23 April 2024 (Captured Surface Switching):

RESOLUTION: more discussion is needed on the lifecyle of surface tracks