w3c / media-source

Media Source Extensions
https://w3c.github.io/media-source/
Other
267 stars 59 forks source link

Refine srcObject's MediaSourceHandle behavior #306

Closed wolenetz closed 1 year ago

wolenetz commented 2 years ago

Current description of this PR: Overall intent for a MediaSourceHandle is that it should be [Transferable], and be restricted to "at most one" successful attachment to the underlying MediaSource by a handle. Further, if a Window MediaSource was previously attached using a legacy MSE object URL, subsequent attempts to use that MediaSource instance's MediaSourceHandle for attachment must also fail.

This refinement originates from discussion on the previous PR with @karlt (https://github.com/w3c/media-source/pull/305) and from lengthy discussion on this change itself at https://github.com/w3c/media-source/pull/306.

This work is associated with the MSE-in-Worker spec issue: https://github.com/w3c/media-source/issues/175

Original description of this PR (updated more recently per https://github.com/w3c/media-source/pull/306#issuecomment-1183638693):

Adds text intending to help prevent using one or more MediaSourceHandle for the same underlying MediaSource for simultaneous, concurrent attachments to media elements:

Note that legacy MSE object URL attachments similarly already achieve the same intent in existing implementations. This change includes a clarification for those similar to the third bullet, but for a MediaSource in the "attaching to a media element" text, since it was never the intent even for main-thread MediaSource attachments for them to be successfully attached to more than one HTMLMediaElement at a time. The switch to more clear srcObject usage for worker MSE attachment affords the spec an opportunity to be more clear about this intent.

This refinement originates from discussion on the previous PR with @karlt: https://github.com/w3c/media-source/pull/305

This work is associated with the MSE-in-Worker spec issue: https://github.com/w3c/media-source/issues/175


Preview | Diff

wolenetz commented 2 years ago

@mwatson2, please review. Also, @karlt, please take a look to ensure I've addressed the issues we discussed in #305.

Thank you!

wolenetz commented 2 years ago

The intent is to prevent making more than one usable copy of a MediaSourceHandle for each MediaSource. Transfer semantics restrictions may not be sufficient, since basic serialization without transfer might be attempted.

I propose trying something like this (adding to my current rough prototype in Chrome):

Hopefully this adds clarity to the intent. I'll now try updating the Chrome prototype to do this and see how it goes and respond further here. @karlt and others, if you see any blockers or concerns, please report them here ASAP. Thanks!

wolenetz commented 2 years ago

e.g., [[CurrentlyAttached]] to be true during setting srcObject to the handle (including resetting that slot to be false first for the previous srcObject if that object were also a handle.

Note the HTMLMediaElement.srcObject setter doesn't have option to throw exception, but we need to synchronously let the handle know it's in the process of starting a load even before the asynchronous load operation starts. We also need to be careful to protect against the following situation:

// On main thread, handle is initially an unattached MediaSourceHandle
// just received from the worker that owns the underlying MediaSource.

media_element1.srcObject = handle;

// Would throw exception now, since handle is CurrentlyAttached:
// worker.postMessage(handle); 

// But this cannot throw exception. Rather, the mediasource attachment
// steps should later run the failure steps for media_element2:
media_element2.srcObject = handle;

// The following demonstrates that [[CurrentlyAttached]] cannot be a
// boolean since if it were, it would have been doubly set on the
// assignment of |handle| to media_element2, above:
media_element2.srcObject = null;
worker.postMessage(handle);  // If boolean, this would succeed, but shouldn't.

Based on that example, I propose changing [[CurrentlyAttached]] to be a counter that is incremented when the handle is set to a srcObject on an HTMLMediaElement, and decremented when the handle is explicitly unset from a srcObject on an HTMLMediaElement. The attachment steps can then just fail if this value is not precisely 1, and the serialization steps could throw exception if this value is not precisely 0.

karlt commented 2 years ago

The intent is to prevent making more than one usable copy of a MediaSourceHandle for each MediaSource. Transfer semantics restrictions may not be sufficient, since basic serialization without transfer might be attempted.

I propose trying something like this (adding to my current rough prototype in Chrome):

* Change MediaSourceHandle to be just "Serializable" (https://html.spec.whatwg.org/multipage/structured-data.html#serializable-objects)

While I assume it is possible to specify a serialize-once behavior, I always thought of such a behavior where the object becomes unusable after message passing to fit better with the [Transferable] concept rather than [Serializable].

"Transferring is effectively recreating the object while sharing a reference to the underlying data and then detaching the object being transferred."

Serialization "allows them to be stored on disk and later restored".

Was there a benefit to making the object [Serializable] rather than [Transferable]?

wolenetz commented 2 years ago

Was there a benefit to making the object [Serializable] rather than [Transferable]?

Mostly this was due to the existing Chrome transferable infrastructure is clunky and might not do the required transfer steps in a completely interoperable order (from my brief reading). Secondly, this simplifies postMessage to be just serialization, not making developers also put the object in a transfer list. The conditional failure if "forStorage" part of the serialization steps allows prevention of usage for other purposes.

wolenetz commented 2 years ago

Regarding the specific postMessage exception: If [[CurrentlyAttached]] > 0 or if [[AlreadySerialized]] is true, then a postMessage attempt including that handle should throw exception. If this condition is found first among other items being posted, then the exception should be a DataCloneError.

wolenetz commented 2 years ago

I've updated the chromium prototype change for this (circa patch set 28):

I've tested the basic scenarios manually and it seems to be working.

wolenetz commented 2 years ago

@jernoble @karlt - I plan to proceed with getting the prototype in Chromium landed in parallel with updating this PR per the details as of https://github.com/w3c/media-source/pull/306#issuecomment-1144180822 Please raise any concerns you might have ASAP to prevent further delay on this highly-desired feature. Thanks!

wolenetz commented 2 years ago

@jernoble @karlt - I plan to proceed with getting the prototype in Chromium landed in parallel with updating this PR per the details as of #306 (comment) Please raise any concerns you might have ASAP to prevent further delay on this highly-desired feature. Thanks!

Thanks to @sandersdan I am now aware of potential further complications: It may not be interoperable to prevent postMessage(handle) from creating more than one handle, since this might actually be implicitly a broadcast on some implementations or on implementations where extensions might get a copy of messages. Achieving true move semantics is not possible. As a result, the window thread might eventually have multiple distinct handles referencing internally the same underlying worker-thread-owned MediaSource. It falls to the implementation to prevent simultaneous attachments (during the asynchronous media element load algorithm) from proceeding successfully, allowing at most 1 of them to attach successfully. tldr: Keeping the [[IsAlreadySerialized]] proposed internal slot is not sufficient to prevent copies, but does help early detection of some of those scenarios. The media element load algorithm must fail if the same underlying MediaSource is already not 'closed'. The latter isn't a change to the spec. Overall, Serializable is simpler ergonomically than Transferable, especially when there is potential for underlying broadcast or lack of true move semantics.

karlt commented 2 years ago

Browser implementations are free to use different approaches if they can produce the same observable behavior. Extensions can modify behavior in any way the browser lets them, if they are prepared to expect consequences, so browsers can't expect to maintain interoperable behavior in the presence of arbitrary extensions.

But what behavior would content see from passing a MediaSourceHandle to BroadcastChannel#postMessage(), if it were serializable?

While merely transferable and not serializable, MediaSourceHandle cannot be transferred through BroadcastChannel, which is consistent with transfer of MessagePort. "transferring does not make sense in multi-destination situations." MessagePort is similar to MediaSourceHandle in that it is a transferable object used for two-directional message-passing communication between threads and processes.

sandersdan commented 2 years ago

browsers can't expect to maintain interoperable behavior in the presence of arbitrary extensions.

As I understand it, all major browser implement the same behavior for message posting, which is broadcasting to all content script realms.

In WebCodecs we wanted to be able to have move semantics for VideoFrame but this detail made it impossible to implement in practice.

The situation with MediaSourceHandle is simpler because the resources it holds are less scarce, but I do want to recommend against inventing a new model for object ownership. If you don't follow the JS-like path (cloneable objects that are GC'd), there are surprising roadblocks.

But what behavior would content see from passing a MediaSourceHandle to BroadcastChannel#postMessage(), if it were serializable?

I would recommend that each receiver gets a valid handle, and that all valid handles are invalidated when one is attached.

While merely transferable and not serializable, MediaSourceHandle cannot be transferred through BroadcastChannel, which is consistent with transfer of MessagePort.

I don't think this is an option. Transferable isn't its own concept (or even an IDL attribute any more), it's a detail of the structured clone algorithm. If you can't serialize it, you can't transfer it either.

If there is a way to accomplish this combination, note that it would also prevent passing these handles through a TransferableStream.

wolenetz commented 2 years ago

I would recommend that each receiver gets a valid handle, and that all valid handles are invalidated when one is attached.

I agree. Further, for a particular handle instance, if it has ever been assigned to an HTMLMediaElement srcObject (even if the subsequent asynchronous load was interrupted and the underlying MediaSource hadn't ever been attached successfully), then that handle instance can never be successfully serialized again.

However, I think instantaneous serialization prevention of any handle instance in any realm with the same underlying MediaSource (such multiple instances can happen if extensions/broadcasts cause clones of the original single handle retrieved from MediaSource.getHandle()) upon any of those handles ever being assigned as srcObject to an HTMLMediaElement would risk security (could enable malicious high-resolution timer attacks).

So I think the following may be the best approach. If we find later that these restrictions can be relaxed, we can update the spec and implementations further:

karlt commented 2 years ago

As I understand it, all major browser implement the same behavior for message posting, which is broadcasting to all content script realms.

That sounds very inefficient, especially for something large like a VideoFrame, at least unless there's shared memory access involved. But efficiency is much less an issue for transferring MediaSourceHandle.

In WebCodecs we wanted to be able to have move semantics for VideoFrame but this detail made it impossible to implement in practice.

I would guess that ReadableStream.tee() support requires a Serializable object. We don't need that for MediaSourceHandle.

I do want to recommend against inventing a new model for object ownership. If you don't follow the JS-like path (cloneable objects that are GC'd), there are surprising roadblocks.

There is no new object ownership model proposed here. MediaSourceHandle can be GC'ed without any need to be clonable/Serializable.

While merely transferable and not serializable, MediaSourceHandle cannot be transferred through BroadcastChannel, which is consistent with transfer of MessagePort.

I don't think this is an option. Transferable isn't its own concept (or even an IDL attribute any more), it's a detail of the structured clone algorithm. If you can't serialize it, you can't transfer it either.

Transferable objects already exist without any need to be Serializable: MessagePort, ReadableStream, WritableStream, and TransformStream are all examples of objects that are Transferable, but not Serializable. If browsers can implement these objects, then I don't see any reason why a Transferable and not Serializable MediaSourceHandle would be a problem.

All these objects serve a similar purpose to MediaSourceHandle in that they provide potentially cross-realm communication with at most one associated other object. (tee() is an explicit extension of this involving additional objects.)

VideoFrame is very different. VideoFrames are data, not communication portals (end-points).

A sensible way to implement and even spec MediaSourceHandle would be that it hold a MessagePort created when MediaSource.getHandle() is called, to be used as [[port to worker]] on HTMLMediaElement.

If there is a way to accomplish this combination, note that it would also prevent passing these handles through a TransferableStream.

What do you mean by TransferableStream? If you can edit and link to a spec, that might be helpful, please?

We don't have any need to pass MediaSourceHandles through ReadableStream, etc. MediaSourceHandles are not "created, processed, and consumed in an incremental fashion", they are not transferred in large numbers, and they do not need backpressure.

sandersdan commented 2 years ago

There is no new object ownership model proposed here. MediaSourceHandle can be GC'ed without any need to be clonable/Serializable.

I'm saying that there is a specific behavior that JS objects have, and that adding new objects that behave differently is fraught. In particular, not being serializable is incompatible with every API except for postMessage(), and is less ergonomic for postMessage().

I agree that there is no specific relationship with GC, but there is a conflict with "transferable only"; the agent will need to keep a copy for each destination realm, and so the serialized message itself will likely be GC'd. If you "transfer out" of the serialized message then it becomes "first recipient wins", which isn't necessarily going to be the intended context.

Ultimately this is a question of style. I say that since move semantics can't be done right, it's better not to add the complications. It's reasonable to prefer the hacks required to partially simulate move semantics, though.

Transferable objects already exist without any need to be Serializable: MessagePort, ReadableStream, WritableStream, and TransformStream are all examples of objects that are Transferable, but not Serializable. If browsers can implement these objects, then I don't see any reason why a Transferable and not Serializable MediaSourceHandle would be a problem.

It's not serializable only in the sense that the serializer checks that it's being transferred and throws an error if not. It's still being serialized, just with an extra hack on top.

I'm not actually sure that browsers can correctly implement these. As an example, Chrome's ReadableStream deserialization implementation will create and entangle multiple copies of the destination port, and I'm not really sure what that even means. One implication seems to be that a transferred MessagePort will duplicate objects in the transfer list (which I guess is exactly what I'm already saying is possible but via a different mechanism).

VideoFrame is very different. VideoFrames are data, not communication portals (end-points).

A VideoFrame is a handle to an underlying resource. They are data-like, but the expectation is that they come from limited buffer pools.

What do you mean by TransferableStream? If you can edit and link to a spec, that might be helpful, please?

I just mean a ReadableStream that has been transferred.

We don't have any need to pass MediaSourceHandles through ReadableStream, etc. MediaSourceHandles are not "created, processed, and consumed in an incremental fashion", they are not transferred in large numbers, and they do not need backpressure.

That's true, although I'm not sure that it's enough reason to actively prevent using such APIs. It seems like there could be cases where some developers would find streams to be convenient, even if they are not always necessary.

wolenetz commented 2 years ago

During yesterday's Media Workgroup meeting, @jernoble attended and provided the following input after some discussion:

I don't have a strong opinion. From a high level point of view, Transferable makes sense as the semantics match … If it's unimplementable, then fine. I'm a little worried that we'll face resistance if we need to change it later, if handles need to be both Transferable and Serializable … I could pull in some of the JavaScript people for input

@karlt : I've landed in Chromium a "Serializable-only" implementation of this, following the precedent and advice from @sandersdan and the issues he encountered when trying to implement true move-semantics using Transferable for VideoFrame. Given those issues' details earlier in this thread, it seems a Transferable-only MediaSourceHandle is impractical to implement: the unfortunate underlying mechanisms for Transferable don't appear to support move semantics in all cases without hacks, and even then the mechanisms still perform serialization. With simple internal state, such as [[has_ever_been_used_as_srcObject]] and [[has_already_been_serialized]], the proposed serialization steps for MediaSourceHandle can throw exception on attempts to serialize a MediaSourceHandle that has previously or currently been assigned as srcObject on an HTMLMediaElement, or if the app attempts to serialize the handle instance more than once. Would you be ok with Chrome shipping this kind of MSE-in-Workers using Serializable MediaSourceHandle that is not Transferable (and, per previous discussion, that doesn't support worker MediaSource attachments via an object URL)?

karlt commented 2 years ago

I agree that there is no specific relationship with GC, but there is a conflict with "transferable only"; the agent will need to keep a copy for each destination realm, and so the serialized message itself will likely be GC'd. If you "transfer out" of the serialized message then it becomes "first recipient wins", which isn't necessarily going to be the intended context.

That's fine when there is only one recipient. i.e the Transferable-only case.

Ultimately this is a question of style. I say that since move semantics can't be done right, it's better not to add the complications.

Move semantics can be done right with Transferable without Serializable.

It's reasonable to prefer the hacks required to partially simulate move semantics, though.

The proposal here appears to be for move semantics on serialization, but no move semantics on deserialization. Why would it be reasonable to have hacks for move semantics that are not done right, when it is not reasonable to have move semantics done right with Transferable-only?

The only argument for this appears to be the calling convention of postMessage(), but that doesn't justify introducing a broken sometimes-move system. A wrapper function around postMessage() would be simple to implement if a simpler calling convention is desired.

Transferable objects already exist without any need to be Serializable: MessagePort, ReadableStream, WritableStream, and TransformStream are all examples of objects that are Transferable, but not Serializable. If browsers can implement these objects, then I don't see any reason why a Transferable and not Serializable MediaSourceHandle would be a problem.

I'm not actually sure that browsers can correctly implement these. As an example, Chrome's ReadableStream deserialization implementation will create and entangle multiple copies of the destination port, and I'm not really sure what that even means. One implication seems to be that a transferred MessagePort will duplicate objects in the transfer list (which I guess is exactly what I'm already saying is possible but via a different mechanism).

MessagePort transfer is fundamental, and I assume any browser's IPC system would have been designed to support that. ReadableStream just builds on MessagePort.

Are you claiming that a MessagePort A, entangled with one other MessagePort B at one point in time, can somehow become entangled with more than one other MessagePort through cloning of MessagePort B? Can you give an example of a situation where this can be observed?

VideoFrame is very different. VideoFrames are data, not communication portals (end-points).

A VideoFrame is a handle to an underlying resource. They are data-like, but the expectation is that they come from limited buffer pools.

Right, but VideoFrame would not be used as a model for MediaSourceHandle, which is not data-like.

That's true, although I'm not sure that it's enough reason to actively prevent using such APIs. It seems like there could be cases where some developers would find streams to be convenient, even if they are not always necessary.

The intended audience is expected to have a postMessage() available. ReadableStream is not so much actively prevented, but ruled out because it is unsuitable.

karlt commented 2 years ago

With simple internal state, such as [[has_ever_been_used_as_srcObject]] and [[has_already_been_serialized]], the proposed serialization steps for MediaSourceHandle can throw exception on attempts to serialize a MediaSourceHandle that has previously or currently been assigned as srcObject on an HTMLMediaElement, or if the app attempts to serialize the handle instance more than once.

Right, but as I think you are acknowledging, that doesn't prevent multiple MediaSourceHandles, each identifying the same MediaSource, each having [[has_already_been_serialized]] false, and being attached to different HTMLMediaElements, from successfully running the "Attaching to a media element" steps concurrently while MediaSource#readyState is "closed".

The motivation for Transferable is that it provides a clear way to ensure that there is at most one HTMLMediaElement attached to a MediaSource. Without that, I don't know that incomplete move semantics from [[has_already_been_serialized]] etc. are much value. You are right that the exceptions thrown are easier to remove later than add in, but they don't support moving to Transferable in the future because the postMessage() calling convention is different.

Without MediaSourceHandle providing the single HTMLMediaElement-MediaSource entanglement, another mechanism is required.

A single MediaSource can receive two tasks from the "Attaching to a media element" steps to set [[port to main]]. Is the second rejected with an error sent back to the HTMLMediaElement that lost the race?

Would you be ok with Chrome shipping this kind of MSE-in-Workers using Serializable MediaSourceHandle that is not Transferable (and, per previous discussion, that doesn't support worker MediaSource attachments via an object URL)?

I wouldn't choose to ship this design myself in another browser given the reasons presented here. The reasons given why Transferable is difficult in Chrome are vague. The appearance is that there has been a lot of effort put into a system that doesn't work because the one that does work has not be investigated fully.

However, even a Serializable MediaSourceHandle is much better than object URLs. Thank you for all your work to put this in place.

I'd like to be sure that implementation is possible without serious inefficiencies (such as broadcasting to every realm) or leaks.

While a serialization of a MediaSourceHandle exists and may yet be deserialized, the identified MediaSource must be marked in-use (not garbage). This ownership pattern of serializations of clonable handles that keep a referenced object alive across realms is similar to that of SharedArrayBuffer, but SharedArrayBuffer is limited to a single agent cluster. Is the intention that MediaSourceHandle use is limited to a single agent cluster? If not, can you comment on how you addressed the ownership/lifetime of the MediaSource in your implementation, please?

I expect a design limited to a single agent cluster would cover the important use cases, but this Serializable MediaSourceHandle design appears awkward to extend to multiple clusters. (I guess it would be something like sending ref-counting messages on each deserialization etc.) Do you think we can rule out the need for multiple agent clusters or have you planned for that?

Does Chrome's VideoFrame implementation implement reference counting across agent clusters (and processes), or does it just copy the readonly data?

sandersdan commented 2 years ago

That's fine when there is only one recipient. i.e the Transferable-only case.

Move semantics can be done right with Transferable without Serializable.

This is exactly what I am saying is not the case.

There can be more than one recipient (in practice frequently are more, but most of the receivers will ignore the clones), and the implementation has to handle that (literally Chrome's implementation copies GC pointers because it can't use unique ownership).

There are a few different hacks available to simulate Transfer-only, but in the corner cases you have to pick some specific behavior. It's not good enough to handle only the simplest path, even if that's what most developers will think is the only path.

Are you claiming that a MessagePort A, entangled with one other MessagePort B at one point in time, can somehow become entangled with more than one other MessagePort through cloning of MessagePort B? Can you give an example of a situation where this can be observed?

Yes, that's exactly what I am saying.

I created a demo using an extension: https://drive.google.com/file/d/1nsyawXUBFGt10BX3Gv0OAfWO4PzgVLF2/view?usp=sharing.

I tested in both Chrome and Firefox with the same result.

TL;DR: (1) A postMessage() transfers a MessagePort to an iframe. (2) The iframe and the extension content script both receive the port, in different contexts, and use it to send a message. (3) The main page logs both messages, "hello from iframe" and "hello from extension".

[Edit: in Chrome I'm confident that this goes through entangling more than once. I am not familiar with the Firefox implementation, it's conceivable that it instead has one C++ object with multiple JS wrappers.]

The intended audience is expected to have a postMessage() available. ReadableStream is not so much actively prevented, but ruled out because it is unsuitable.

I'm not sure I follow why it is unsuitable, other than a preference for Transfer-only behavior.

Does Chrome's VideoFrame implementation implement reference counting across agent clusters (and processes), or does it just copy the readonly data?

The implementation is locked to the agent cluster, it will not cross processes. If there were demand for that feature, we would implement it with copying, like ImageBitmap does. I don't think there is demand for that, but it would also enable storing them in IndexedDB which might be useful for VideoFrame.

wolenetz commented 2 years ago

With simple internal state, such as [[has_ever_been_used_as_srcObject]] and [[has_already_been_serialized]], the proposed serialization steps for MediaSourceHandle can throw exception on attempts to serialize a MediaSourceHandle that has previously or currently been assigned as srcObject on an HTMLMediaElement, or if the app attempts to serialize the handle instance more than once.

Right, but as I think you are acknowledging, that doesn't prevent multiple MediaSourceHandles, each identifying the same MediaSource, each having [[has_already_been_serialized]] false, and being attached to different HTMLMediaElements, from successfully running the "Attaching to a media element" steps concurrently while MediaSource#readyState is "closed".

The motivation for Transferable is that it provides a clear way to ensure that there is at most one HTMLMediaElement attached to a MediaSource. Without that, I don't know that incomplete move semantics from [[has_already_been_serialized]] etc. are much value. You are right that the exceptions thrown are easier to remove later than add in, but they don't support moving to Transferable in the future because the postMessage() calling convention is different.

[[has_ever_been_used_as_srcObject]] is very useful to specify the behavior that should happen if a particular MediaSourceHandle instance has been assigned ever to srcObject on an HTMLMediaElement and the app then tries to serialize it later. Even if transferable were attempted instead of serializable, this would still be my preferred approach, since it would immediately prevent the app from taking the handle away from a media element's execution context where the handle (and the underlying MediaSource in the worker) are either still pending possible asynchronous attachment start, are still attached, or were previously attached.

[[has_already_been_serialized]] is a helpful, but not a fully constraining bit of information that helps reduce (but cannot fully prevent) duplications on dispatch (e.g. via an implementation using broadcast internally for MessageChannels, or an implementation duplicating messages to send them to extensions). I'm fine with dropping it if it is the sticking point on this API, but I do believe it has some value in signalling the intent to applications that explicitly repeating attempts to serialize a handle are not expected to be allowed (or even sending a message with the same handle in it more than once).

Without MediaSourceHandle providing the single HTMLMediaElement-MediaSource entanglement, another mechanism is required.

A single MediaSource can receive two tasks from the "Attaching to a media element" steps to set [[port to main]]. Is the second rejected with an error sent back to the HTMLMediaElement that lost the race?

This is a good implementation point: in Chromium implementation, since there is reliance on the constraint that a DedicatedWorker is in the same process (or more generally, in the same agent cluster) as the dedicated worker or window that created it, the implementation of spec draft for attachment steps implicitly takes a lock in the "attaching to a media element" steps in the worker MediaSource to see if it is closed and to attempt to open it. As such, the attachment attempt that loses the race to obtain that lock will see that the MediaSource is either already open (and loses), or was already opened and later closed but the MediaSource attachment abstraction in the implementation prevents even the latter case from succeeding since it enforces maximum of one successful attachment per MediaSource when that MediaSource was created in a dedicated worker.

The spec draft therefore could use some improvement to clarify: Once a worker-owned MediaSource has ever been attached successfully, a re-attachment (e.g. re-load() on HTMLMediaElement with same handle as before, or perhaps a duplicated-via extension/broadcast/etc handle is used) of the MediaSource should fail with steps similar to the 'If readyState is NOT set to "closed"' attaching step.

Is the intention that MediaSourceHandle use is limited to a single agent cluster? ... this Serializable MediaSourceHandle design appears awkward to extend to multiple clusters.

@karlt (and also @jernoble since he had a question about expected behavior around this during this week's Media WG meeting): Yes, the intention is that MediaSourceHandle use is limited to a single agent cluster. The handle references a data abstraction (MediaSource) that, even with an objectURL approach, would still have been in the single agent cluster. PostMessage-ing a MediaSourceHandle to a SharedWorker should fail similarly to attempt to do the same with a SharedArrayBuffer, and I've locally verified the experimental Chrome implementation of MediaSourceHandle behaves this way. Internally, the mechanism in Chromium to signal this is during serialization, mark the record as locked to agent cluster, and MediaSourceHandle serialization does this (as does VideoFrame already and others).

Do you think we can rule out the need for multiple agent clusters or have you planned for that?

Thanks to this discussion and @jernoble asking similarly about SharedWorker and MediaSourceHandle, yes, we need to rule out the support for communicating and using a MediaSourceHandle via or in multiple agent clusters.

Does Chrome's VideoFrame implementation implement reference counting across agent clusters (and processes), or does it just copy the readonly data?

@sandersdan can assist with this answer, I believe. My understanding is the serialization of a VideoFrame in Chrome also locks the message to the same agent cluster as the transmitter, and the underlying representation is held in a scoped_refptr<> that only works within a process, not cross-process, though there may be some further complication due to cases where a VideoFrame might refer to an out-of-process backing store, such as that in gpu via another indirection like a handle.

Notably, the internal handle representation for the result of MediaSource.getHandle() (even if subsequently potentially shared across multiple MediaSourceHandle instances resulting from extension or broadcast duplication) is a refcounted object (scoped_refptr<>) which persistently keeps the MediaSource from being GC'ed while the handle as the reference. Upon attachment start (e.g. the asynchronous task in HTMLMediaElement that follows from assigning the handle to srcObject), the media element takes the handle instance's internal refcounted attachment object, if it still has it (and it clears that field from the handle instance). If the media element finds the internal refcounted attachment object is not longer valid (e.g. serialized away already, where the internal refcounted object is moved during serialization, or if this handle has previously already begun attachment), then media load fails per the spec draft. Otherwise, the usual attachment start process begins, which itself detects races in attempted simultaneous attachment via the same internal refcounted attachment object to the underlying MediaSource instance, as described above. If attachment is successful, then the scoped_refptr<> attachment object keeps persistent references to both the media element (in window context) and the mediasource object (in worker context) to prevent them from being GC'ed. Later detachment drops the media element's reference to the internal attachment object, and if there are no other references, this lets cleanup happen to the detached objects. Even in a Transferable implementation with true move semantics, most/all of this would remain the same - I'm just trying to provide helpful implementation notes around how I solved this in Chrome.

Would you be ok with Chrome shipping this kind of MSE-in-Workers using Serializable MediaSourceHandle that is not Transferable (and, per previous discussion, that doesn't support worker MediaSource attachments via an object URL)?

I wouldn't choose to ship this design myself in another browser given the reasons presented here. The reasons given why Transferable is difficult in Chrome are vague. The appearance is that there has been a lot of effort put into a system that doesn't work because the one that does work has not be investigated fully.

I'm not sure I understand the last sentence fully. The Serializable-only system does work, from what I can tell. The Transferable-only doesn't provide true move-only semantics from what I understand from @sandersdan. I do appreciate all of your thoughtful input - I'm trying to find the best way to navigate the advice from @sandersdan around complications that arose due to lack of true move-only Transferable semantics while enabling this highly-sought-after MSE-in-Workers API with explainable semantics with a minimum of shocking surprises to API users.

karlt commented 2 years ago

Are you claiming that a MessagePort A, entangled with one other MessagePort B at one point in time, can somehow become entangled with more than one other MessagePort through cloning of MessagePort B? Can you give an example of a situation where this can be observed?

Yes, that's exactly what I am saying.

I created a demo using an extension: https://drive.google.com/file/d/1nsyawXUBFGt10BX3Gv0OAfWO4PzgVLF2/view?usp=sharing.

I tested in both Chrome and Firefox with the same result.

TL;DR: (1) A postMessage() transfers a MessagePort to an iframe. (2) The iframe and the extension content script both receive the port, in different contexts, and use it to send a message. (3) The main page logs both messages, "hello from iframe" and "hello from extension".

[Edit: in Chrome I'm confident that this goes through entangling more than once. I am not familiar with the Firefox implementation, it's conceivable that it instead has one C++ object with multiple JS wrappers.]

Thank you for going to the trouble to create a demo. The clone_message demo shows the behavior expected when the iframe and extension content script each see the same MessagePort and postMessage() on that MessagePort.

I've extended clone_message to demonstrate that the extension content script sees the same MessagePort as the iframe: https://drive.google.com/file/d/1Xhd8hZTdwNWBvgj_nUh_GU-mTo5lgIm1/view Only the extension content script calls start() but the iframe receives a message when using only addEventListener(). The behavior is the same in Firefox and Chrome.

There is apparently only one MessagePort entangled with channel.port1 of index.html.

karlt commented 2 years ago

I don't think this is a sticking point, just an observation: BroadcastChannel and ReadableStream.tee() would behave differently should one of them be used to produce multiple clones of a Serializable MediaSourceHandle:

karlt commented 2 years ago

In an attempt to summarize my reasons for preferring Transferable without Serializable:

  1. Transferable better captures the intended purpose of MediaSourceHandle. This appears to be generally agreed given attempts to add move-semantics where possible to Serializable.
  2. Transferable clearly avoids the chance of a race to attach a single HTMLMediaElement to a MediaSource.

The restriction to a single agent cluster, however, is also sufficient to avoid the race. HTMLMediaElement is exposed only on Window and there is "only one similar-origin window agent per browsing context agent cluster". Therefore all attempts to set a MediaSourceHandle identifying the same MediaSource will run on the same thread. The attachment that succeeds is the first attempted, which is clearly defined (and there is no need for a lock while checking whether the MediaSource has previously been attached).

The single agent cluster also simplifies MediaSource lifetime management, thank you for the details.

The arguments for a Serializable are

  1. There are fewer parameters required in postMessage() calls.
  2. The handle can be sent via ReadableStream, etc. and BroadcastChannel. BroadcastChannels in different agent clusters will receive a "messageerror" event, I assume. Streams transferred across agent clusters also have error handling for deserialization exceptions, but I'm not so familiar with that.
  3. There are fears that Chrome may not handle unidentified edge cases with Transferable appropriately.

My preference would still be for Transferable, but my key concern with Serializable was with the race, which has now been resolved by the limit to single agent cluster. I recognize that others have different preferences. I thank you for listening to my views and trust you to proceed with the direction you feel is best.

wolenetz commented 2 years ago

@karlt thank you for your continued clear analyses of this. With the implementation working in Chrome well now, using Serialization, and with the benefits of Transferable's move semantic clarity for web authors perhaps not being as reliable as expected (since duplication can occur), I think inclusion of non-normative notes describing usage of serializable handle and implementation guidance like race prevention (as we already need to do with multiple main-thread MediaSource objectURLs racing attachment for same MediaSource instance) would be sufficient to obtain good interop and not be surprising. Further, inclusion of single-agent-cluster requirement will need to be in the spec (along with informative text describing why), to help reduce further confusion.

While serializing a handle to a stream might work to communicate it, the expected usage would be via postMessage, not via streams. I'll see what the Chrome implementation does with such serialization today and update here.

jan-ivar commented 2 years ago

(I'll ignore claims that Transferable isn't implementable as those seem out of scope for this WG, and should probably be up-leveled to WHATWG)

I think a precedent to follow here is VideoTrackGenerator, which has a generator.track member that is Transferable.

The generator remains in the worker and its track can be transfered to main for use as a srcObject (sound familiar?):

    // worker.html
    const generator = new VideoTrackGenerator();
    parent.postMessage({track: generator.track}, [generator.track]);

    // main.html
    const {data} = await new Promise(r => worker.onmessage);
    video.srcObject = new MediaStream([data.track]);

This example would work in Chrome once it implements transferable MST (crbug 1288839).

This would seem to match the needs of MediaSource quite well, and ensure consistency in the web platform.

With that model in mind, I propose these changes to the PR:

  1. Instead of throwing, consider [SameObject]
  2. Instead of a getHandle method, consider a getter:
    partial interface MediaSource {
       [SameObject] readonly attribute MediaSourceHandle handle;
    };
  3. Instead of the unusual semantics of locking the source in the getter, consider doing what VideoTrackGenerator does, which is rely on the track/handle's [[detached]] state, which transitions once transfered with postMessage(,[]).

This should give more flexible, desirable, and less surprising behavior. Thoughts?

wolenetz commented 2 years ago

While serializing a handle to a stream might work to communicate it, the expected usage would be via postMessage, not via streams. I'll see what the Chrome implementation does with such serialization today and update here.

I tried serialization of a MediaSourceHandle across a transferred (from worker to main) ReadableStream: that worked. Then I added a further serialization on the main thread of that handle via a WritableStream whose sink then initiated the attachment: that also worked. I didn't encounter any surprises. Notably, this method of communicating a handle would not work if the MediaSourceHandle were Transferable-only.

@jan-ivar - while I certainly prefer consistency, I am wondering if this lack of a Transferable-only version of handle's communicability via a stream makes it worthy of being just Serializable.

wolenetz commented 2 years ago

@jan-ivar - while I certainly prefer consistency, I am wondering if this lack of a Transferable-only version of handle's communicability via a stream makes it worthy of being just Serializable.

This use case (communicating a handle via a transferred Stream) is not supported by Transferable-only. I tend to think supporting this use case is more important than remaining consistent with a precedent that does not support this use case.

wolenetz commented 2 years ago

2. The handle can be sent via ReadableStream, etc. and BroadcastChannel. BroadcastChannels in different agent clusters will receive a "messageerror" event, I assume. Streams transferred across agent clusters also have error handling for deserialization exceptions, but I'm not so familiar with that.

Correct, from what I can tell: A MediaSourceHandle enqueued in a ReadableStream that is itself transferred to a SharedWorker causes a DataCloneError when the SharedWorker attempts to read, e.g. const {done, value} = await reader.read(). In Chrome, the exception message is "chunk could not be cloned". This appears to be expected per Streams API SetUpCrossRealmTransformReadable's step 4 (messageerror handler setup) along with the HTML spec definition of the messageerror event: "Fired at an object when it receives a message that cannot be deserialized"

jan-ivar commented 2 years ago

Why is communicating a handle via a transferred stream important?

wolenetz commented 2 years ago

Why is communicating a handle via a transferred stream important?

Transferable streams are a basic communication mechanism in the web platform, giving web authors the utility of building ordered workflows on top of stream semantics. I could imagine a scenario where application control messages are queued into a cross-realm stream separately from the existing MessageChannel mechanism, including things like "setup media element to begin playing using this handle". Stream idioms, like cancellation/abort/adaptation on stream read speed/etc may better match application developer's needs. Of course, I don't expect streams to be the exclusive way to communicate handles - I suspect many cross-realm apps already may be structured around basic messaging for their workflow control - but supporting communication of a handle via a transferred stream does give them flexibility.

jan-ivar commented 2 years ago

What about a scenario of sending another transferable stream in such an application control message? Catch 22.

I'm a big fan of streams, but getting them to parity with postMessage as a goal seems like something that should be up-leveled to WHATWG again.

If Transferable objects are no longer to be used, I'd like to hear that from higher up. Until then, I prefer applying the right design idioms we have today for new APIs. postMessage > blob URL.

A MediaSourceHandle enqueued in a ReadableStream that is itself transferred to a SharedWorker causes a DataCloneError when the SharedWorker attempts to read, e.g. const {done, value} = await reader.read(). In Chrome, the exception message is "chunk could not be cloned". This appears to be expected per Streams API SetUpCrossRealmTransformReadable's step 4 (messageerror handler setup)

This thread is getting long. Maybe file an issue with WHATWG streams to discuss it there? Is there any inherent reason why it couldn't work? Maybe it can be fixed.

sandersdan commented 2 years ago

If Transferable objects are no longer to be used

Just to clarity, the concept of Transferable is totally fine. My only claim is that it doesn't guarantee single-instance (move) semantics. There are a few specs that have variously tried to retrofit that, but it just doesn't work well with the underlying JS machinery. I don't think the issue has ever been properly reviewed at a higher level.

(I have a further opinion that requiring Transfer is unnecessarily limiting, but it's not the core thing I want to make sure you are aware of when writing the MediaSource spec.)

The WebCodecs VideoFrame is in fact transferable; it operates on top of the clone semantics by also closing the source frame. From the sender's perspective it can be an ergonomic way to move a frame, but from the spec side we acknowledge that there can end up being multiple clones and define the semantics when that happens (in our case, closing one of those clones closes all of them).

jan-ivar commented 2 years ago

I don't think the issue has ever been properly reviewed at a higher level.

Then it sounds like an issue at a higher level should be opened, and discussed there.

There seems to be two questions here 1) does it work, and 2) do we want it to work that way. Let's uplevel the first, and discuss the second.

The API design question is whether it's desirable semantics to "transfer" (move) ownership of a MediaSource with postMessage.

My limited understanding is that a MediaSource differs from VideoFrame in that the former can only be the source of a single sink (at a time, or ever?), whereas a VideoFrame can be shared and used by multiple sinks. Happy to be enlightened if this is not the case.

sandersdan commented 2 years ago

My limited understanding is that a MediaSource differs from VideoFrame in that the former can only be the source of a single sink (at a time, or ever?), whereas a VideoFrame can be shared and used by multiple sinks. Happy to be enlightened if this is not the case.

This is correct; the MediaSource situation is simpler that the WebCodecs one. The observable behavior of the objects is quite different, but the ownership situation ends up similar simply because VideoFrames are a scarce resource.

wolenetz commented 2 years ago

I have no strong reason why not Transferable for MediaSourceHandle. I was following the technical advice around issues related to assumed (but not actual) move semantics for Transferable objects.

Certainly in Chrome implementation, I can try a proof of concept that allows MediaSourceHandle to be transferable (and it would then be a simple change, though unfortunate for users currently desiring this ship ASAP, to require handle transfer. See [note] below.)

Note that I don't think the following would work though: postMessage(null, [handle]), since there is no field in the received MessageEvent for handles (or video frames etc, except ports. Instead, the following would work: postMessage(handle, [handle])

Going that route, would Transferable-only be consistent with precedent? Or should the following also work: postMessage(handle)

(Note that the item being transferred or serialized is a handle, not the actual MediaSource. Conceptually, the spec can ensure that at most one handle can succeed in attaching to a worker-owned MediaSource, and that once attached successfully, implementations will need to be able to prevent leakage of the cross-realm internal links from potential handle copies. This iis already in the Chrome prototype of serializable handle, and would still be needed in a Transferable version - regardless of whether transfer were required or just allowed.)

[note] If both a serializable and a transferable version would be acceptable, then that would allow more rapid use of the current MSE-in-Workers Chrome implementation, provided they only use serializable initially. Later expansion to also support transfer could be done. But if instead, transfer of the handle is required, then that would unfortunately delay initial use of the Chrome implementation.

jan-ivar commented 2 years ago

Thanks for considering this in spite of implementation challenges and deadlines. I think it's important we get the API right.

I think the two meaningful semantic models on the table for this WG are:

  1. Multiple handles — serializable references whose use in srcObject are first come, first serve
  2. Single handlenot serializable, and can only be explicitly handed off through postMessage(x, [x])

Transfer seems redundant with multiple handles. The feature of single handle is its restriction.

The argument for restriction would be a single handle fits MediaSource being single-use. Giving JS only a single handle to manage would be KISS and lead to fewer bugs (with less power comes less responsibility). E.g. thread races become impossible.

An argument for multiple handles being KISS is it defers hand-off to srcObject assignment time, which has some (same thread) precedent, but at the cost of allowing thread races.

Going that route, would Transferable-only be consistent with precedent?

That's a bit complicated to answer, since both VideoFrame and MediaStreamTrack need to relinquish (hardware) resources, which is why they allow both transfer and serialization (whose side-effect is requiring JS to close each clone to relinquish the underlying resource).

wolenetz commented 2 years ago

Thanks for considering this in spite of implementation challenges and deadlines. I think it's important we get the API right.

I think the two meaningful semantic models on the table for this WG are:

  1. Multiple handles — serializable references whose use in srcObject are first come, first serve
  2. Single handlenot serializable, and can only be explicitly handed off through postMessage(x, [x])

Transfer seems redundant with multiple handles. The feature of single handle is its restriction.

The argument for restriction would be a single handle fits MediaSource being single-use. Giving JS only a single handle to manage would be KISS and lead to fewer bugs (with less power comes less responsibility). E.g. thread races become impossible.

An argument for multiple handles being KISS is it defers hand-off to srcObject assignment time, which has some (same thread) precedent, but at the cost of allowing thread races.

Going that route, would Transferable-only be consistent with precedent?

That's a bit complicated to answer, since both VideoFrame and MediaStreamTrack need to relinquish (hardware) resources, which is why they allow both transfer and serialization (whose side-effect is requiring JS to close each clone to relinquish the underlying resource).

There is another dimension to the semantic models: handle reuse. In the Chrome prototype and proposals earlier in this PR, essentially only a single handle could ever be provided by the MediaSource and whichever (if any) media element begins attachment first with it wins. Re-loading with that same handle would not work, even sequentially on same element. This is very similar to lack of reuse of an autorevoked object URL in same-thread MSE model, except that only a single handle per worker MediaSource would be provided. In my mind, this allows for simpler implementation without burdening or complicating the API. It also doesn't lock the spec or API into always preventing serial re-use of a handle (relaxing that shouldn't necessarily be a breaking change later, if that is desired); similarly, it doesn't lock the spec or API into always requiring a single handle per worker MediaSource instance (likewise, such relaxation shouldn't necessarily break apps later.)

I think Single handle, not reusable serially or concurrently to (re)load/(re)attach to a media element, makes sense for now. Apps can easily create more MediaSource instances and obtain their handles as needed.

I'll proceed with the Transferable-only prototype for a handle and see how implementable it is. If you feel this is the wrong direction, please respond ASAP to prevent further delays.

wolenetz commented 2 years ago

@karlt @jernoble @jan-ivar - I think I have a working prototype of Transfer-only, Single-handle, Single-use MSE-in-Workers that will land hopefully soon in Chrome 105 milestone behind experimental flags: --enable-blink-features=MediaSourceInWorkers,MediaSourceInWorkersUsingHandle (or just --enable-experimental-web-platform-features). Change that switches from Serializable-only to Transferable-only is in review now at https://chromium-review.googlesource.com/c/chromium/src/+/3733664.

Assuming this route has no objections, I'll be updating this spec PR soon.

wolenetz commented 1 year ago

I'm working on updating this PR now that Transferable-only MediaSourceHandle (+tests) and [SameObject] handle getter (+tests) are landed (former) or landing soon (latter) in Chromium's experimental implementation.

Notable changes that are needed to this PR:

-edited to add last list item

wolenetz commented 1 year ago

@mwatson2 @karlt @jernoble (and @jan-ivar) Please take a look at this updated PR. I've made significant changes to the original refinement PR, per our conversations. Experimental implementation of this is in Chromium M105 (as of 105.0.5180.0), and associated web-platform-tests have been added/updated.

Thank you all for your input!

wolenetz commented 1 year ago

@mwatson2 please take a look. @karlt @jan-ivar @jernoble please also take a look.

I'd like to merge this ASAP as I believe it addresses the concerns and the use case. Hopefully any further feedback is editorial in nature and could be addressed in later PRs.

wolenetz commented 1 year ago

I believe I have addressed all feedback. I'll squash and merge this PR shortly. Please file an issue if you see anything further needing correction once that merge has happened. Thank you, everyone, for your comments.