Use postMessage pattern for two-way messaging.

w3c / mediacapture-handle

https://w3c.github.io/mediacapture-handle/

Other

14 stars 10 forks source link

Use postMessage pattern for two-way messaging. #69

Open jan-ivar opened 1 year ago

jan-ivar commented 1 year ago

We need a "a rudimentary messaging API" to overcome storage partitioning. We also need to stop reinventing postMessage in #11 and #68. I'm proposing postMessage-shaped APIs for both directions:

capturer ← capturee: Replace the cross-origin info-surfacing handle & permittedOrigins which reinvent postMessage
capturer → capturee: A one-way postMessage that takes transferables (like e.g. a port!)

What is postMessage?

It's the web's API pattern for cross-origin communications. All APIs that expose information to realms of other origins have this shape today, except for this spec.

It's a pattern, because there's no single postMessage, but three:

(one-to-one) MessageChannel
(one-to-many) BroadcastChannel but same-origin (no transferables)
(one-to-any) window.postMessage with targetOrigin

But none of them fit exactly, because...

The capture comms problem is unique

There can be more than one capturer, and we'd like the capturee to remain unaware unless contacted. A fourth type, a mix of the last two, seems needed: a cross-origin broadcast-like channel.

Capturees also need to communicate ahead of capture. This led to the current API which acts like publishing ahead of capture, and like postMessage during capture. But a postMessage shape can solve this too.

The proposed API

A capturee would have a mediaDevices.capturer.postMessage(msg, "*") that broadcasts to all capturers. Since it doesn't take transferables, all messages are clonable, so it caches the last message posted, and surfaces it on future capturers! — This lets the capturee surface whatever info it likes (ids, handles, crop targets) to present and future capturers. Both present and future capturers are also updated the same way (by posting a new message).

A capturer can send messages directly to the capturee. With this, exchanging a 1-1 port becomes trivial:

// capturee
navigator.mediaDevices.capturer.postMessage(msg, "*");
navigator.mediaDevices.capturer.onmessage = e => recognize(e.origin) && oneOn1Port = e.port;

// capturer
captureController.onmessage = e => {
  if (recognize(e.origin)) {
    const {port1, port2} = new MessageChannel();
    oneOn1Port = port1;
    captureController.postMessage(port2, {targetOrigin: e.origin, transfer: [port2]});
  }
};

This new API as shown is on CaptureController, which would also solve https://github.com/w3c/mediacapture-handle/issues/12.

youennf commented 1 year ago

The first broadcast API has some benefits:

The model is cleaner, we attach it to CaptureController for instance instead of the track.
We reuse known events/known patterns on capturer side.
We enforce an origin contrary to capture handle (via MessageEvent.origin .

It also has some downsides:

It is a replacement to an existing API, no new functionality really.
The implicit message to cache on capturee side so that new capturer can get the event is not really aligned with postMessage and seems surprising/a bit awkward. I think we should consider bringing the benefits of this approach (enforce origin, move to CaptureController). I am less sure about reusing postMessage though. Another approach to look at is something like HTML History state (which is using structure clone, like we could do for capture handle). This seems closer to what we are trying to model.

As of the second API, I like the fact it tries to be minimal. The main difference I see compared to my past suggestions is that capturer.onmessage can happen for any capturer. There is no way to understand/ensure that two successive messages are coming from the same capturer. This might be error prone (although rare in practice). That is the main reason I suggested in the past an API where a capturer would be represented as a single event on capturee side (only sent when capturer decides it wants to talk to capturee), and a MessagePort would be directly created for actual communication. Given we now have CaptureController, my past suggestion would be something like:

// capturee
navigator.mediaDevices.onDisplayCapturer = e => recognize(e.origin) && oneOn1Port = e.port;

// capturer
captureController.captureePort().postMessage(...);

One area to study before we try to dive into APIs is what happens when capturee navigates to a new document (or navigates back to a previous document using B/F cache). Maybe this should be somehow modelled into the API. MessagePorts are not great at knowing that the guy you are talking to is actually gone.

jan-ivar commented 1 year ago

The first broadcast API has some benefits:

Great to hear!

It also has some downsides:

It is a replacement to an existing API, no new functionality really.

Being able to pass real messages, even objects (like cropTargets), would be new functionality (solving #11 and #68).

BroadcastChannel.postMessage shows StructuredDeserialize can fail, and fires a messageerror event. Don't we need to solve that here? Design principle § 2.4. Be consistent urges us to not reinvent what has been solved.

The implicit message to cache on capturee side so that new capturer can get the event is not really aligned with postMessage and seems surprising/a bit awkward.

Every shape has tradeoffs, but a broastcast postMessage with a delay seems less awkward than a getter + event that can be abused as postMessage using setTimeout().

Recall that BroadcastChannel already sends messages into the ether, with no guarantee of recipients, time of delivery, or response (users open and close documents).

In that light, having the last burst delivered to new listeners added since the burst — while certainly new distinct behavior we'd define because it's useful — doesn't seem model-breaking, because it doesn't seem to violate any guarantees one might rely on. IOW, where it differs, it's useful.

There are already 3 postMessage functions whose functionality vary significantly because it's useful. Is this difference larger than those?

There is no way to understand/ensure that two successive messages are coming from the same capturer. This might be error prone (although rare in practice).

Check e.origin? This should be a known pattern, an advantage of using a familiar postMessage shape.

Another approach to look at is something like HTML History state (which is using structure clone, like we could do for capture handle). This seems closer to what we are trying to model.

Happy to consider other proposals! Maybe in a separate issue?

One area to study before we try to dive into APIs is what happens when capturee navigates to a new document

The user agent could post a blank message on navigation, replacing capturehandlechange.

eladalon1983 commented 1 year ago

We also need to stop reinventing postMessage in https://github.com/w3c/mediacapture-handle/issues/11 and https://github.com/w3c/mediacapture-handle/issues/68.

"Need to stop" - strong wording there! But an explanation is missing for why we "need" this. Some browsers have shipped Capture Handle and some Web developers have started using it productively. It is unclear to me why we "need" to break the Web through cosmetic changes that confer no added functionality.

It's also unclear to me why this issue is in the mediacapture-handle repo. Seems like it belongs in mediacapture-screen-share, given that you're not extending Capture Handle or building on top of it. Rather, you are proposing to replace Capture Handle altogether by your competing proposal. So I repeat - we should move Capture Handle back to the WICG, given that we have no consensus for it, and given that you don't intend to implement it, and are trying to convince all involved that it should be deprecated in favor of your own proposals.

As for the viability of this mechanism as a replacement - I have my own proposal to extend Capture Handle with a MessagePort. If you check out the Challenges portion of that proposal, I hope you will see why I have a different proposal than yours. (Also, your proposal does not work for Conditional Focus, because the cached message you propose will only be readable after the window of opportunity closes.)

jan-ivar commented 1 year ago

(Also, your proposal does not work for Conditional Focus, because the cached message you propose will only be readable after the window of opportunity closes.)

Thanks for asking about that: capturers would register captureController.onmessage ahead of calling getDisplayMedia, so the initial event would fire within the window of opportunity (on gDM's queued success task, similar to how we do it in sRD). So this should work with Conditional Focus, which is an important goal. I'm sure we can work out those details.

We also need to stop reinventing postMessage in https://github.com/w3c/mediacapture-handle/issues/11 and https://github.com/w3c/mediacapture-handle/issues/68.

"Need to stop" - strong wording there! But an explanation is missing for why we "need" this.

Please see https://github.com/w3c/mediacapture-handle/issues/11 and https://github.com/w3c/mediacapture-handle/issues/68 for explanations. If they're insufficient, let's discuss it there.

It is unclear to me why we "need" to break the Web through cosmetic changes that confer no added functionality.

Again, "Being able to pass real messages, even objects (like cropTargets), would be new functionality (solving https://github.com/w3c/mediacapture-handle/issues/11 and https://github.com/w3c/mediacapture-handle/issues/68)." so I don't think it's accurate to say "no added functionality". Also "Web" = "Chrome" here.

As for the viability of this mechanism as a replacement - I have my own proposal to extend Capture Handle with a MessagePort. If you check out the Challenges portion of that proposal, I hope you will see why I have a different proposal than yours.

I see that was opened an hour ago, so thanks I'll be happy to take a look.

eladalon1983 commented 1 year ago

It is unclear to me why we "need" to break the Web through cosmetic changes that confer no added functionality.

Also "Web" = "Chrome" here.

If Mozilla does not think that Capture Handle should be part of the WP, and is not committed to implementing it, then that means that the spec belongs back in the WICG.

youennf commented 1 year ago

To help move forward the discussion, it might help to split the discussion in two:

the broadcast/capture handle identity functionality
the capturee/capturer message channel.

We should be able to improve on both in parallel independently, probably as separate issues. As said earlier, for each of these issues, getting a comparison of the pros and cons of the various approaches will help drive the discussion forward.