Closed wolenetz closed 1 year ago
@mwatson2, please review. Also, @karlt, please take a look to ensure I've addressed the issues we discussed in #305.
Thank you!
The intent is to prevent making more than one usable copy of a MediaSourceHandle for each MediaSource. Transfer semantics restrictions may not be sufficient, since basic serialization without transfer might be attempted.
I propose trying something like this (adding to my current rough prototype in Chrome):
Hopefully this adds clarity to the intent. I'll now try updating the Chrome prototype to do this and see how it goes and respond further here. @karlt and others, if you see any blockers or concerns, please report them here ASAP. Thanks!
e.g., [[CurrentlyAttached]] to be true during setting srcObject to the handle (including resetting that slot to be false first for the previous srcObject if that object were also a handle.
Note the HTMLMediaElement.srcObject setter doesn't have option to throw exception, but we need to synchronously let the handle know it's in the process of starting a load even before the asynchronous load operation starts. We also need to be careful to protect against the following situation:
// On main thread, handle is initially an unattached MediaSourceHandle
// just received from the worker that owns the underlying MediaSource.
media_element1.srcObject = handle;
// Would throw exception now, since handle is CurrentlyAttached:
// worker.postMessage(handle);
// But this cannot throw exception. Rather, the mediasource attachment
// steps should later run the failure steps for media_element2:
media_element2.srcObject = handle;
// The following demonstrates that [[CurrentlyAttached]] cannot be a
// boolean since if it were, it would have been doubly set on the
// assignment of |handle| to media_element2, above:
media_element2.srcObject = null;
worker.postMessage(handle); // If boolean, this would succeed, but shouldn't.
Based on that example, I propose changing [[CurrentlyAttached]] to be a counter that is incremented when the handle is set to a srcObject on an HTMLMediaElement, and decremented when the handle is explicitly unset from a srcObject on an HTMLMediaElement. The attachment steps can then just fail if this value is not precisely 1, and the serialization steps could throw exception if this value is not precisely 0.
The intent is to prevent making more than one usable copy of a MediaSourceHandle for each MediaSource. Transfer semantics restrictions may not be sufficient, since basic serialization without transfer might be attempted.
I propose trying something like this (adding to my current rough prototype in Chrome):
* Change MediaSourceHandle to be just "Serializable" (https://html.spec.whatwg.org/multipage/structured-data.html#serializable-objects)
While I assume it is possible to specify a serialize-once behavior, I always thought of such a behavior where the object becomes unusable after message passing to fit better with the [Transferable] concept rather than [Serializable].
"Transferring is effectively recreating the object while sharing a reference to the underlying data and then detaching the object being transferred."
Serialization "allows them to be stored on disk and later restored".
Was there a benefit to making the object [Serializable] rather than [Transferable]?
Was there a benefit to making the object [Serializable] rather than [Transferable]?
Mostly this was due to the existing Chrome transferable infrastructure is clunky and might not do the required transfer steps in a completely interoperable order (from my brief reading). Secondly, this simplifies postMessage to be just serialization, not making developers also put the object in a transfer list. The conditional failure if "forStorage" part of the serialization steps allows prevention of usage for other purposes.
Regarding the specific postMessage exception: If [[CurrentlyAttached]] > 0 or if [[AlreadySerialized]] is true, then a postMessage attempt including that handle should throw exception. If this condition is found first among other items being posted, then the exception should be a DataCloneError.
I've updated the chromium prototype change for this (circa patch set 28):
I've tested the basic scenarios manually and it seems to be working.
@jernoble @karlt - I plan to proceed with getting the prototype in Chromium landed in parallel with updating this PR per the details as of https://github.com/w3c/media-source/pull/306#issuecomment-1144180822 Please raise any concerns you might have ASAP to prevent further delay on this highly-desired feature. Thanks!
@jernoble @karlt - I plan to proceed with getting the prototype in Chromium landed in parallel with updating this PR per the details as of #306 (comment) Please raise any concerns you might have ASAP to prevent further delay on this highly-desired feature. Thanks!
Thanks to @sandersdan I am now aware of potential further complications: It may not be interoperable to prevent postMessage(handle) from creating more than one handle, since this might actually be implicitly a broadcast on some implementations or on implementations where extensions might get a copy of messages. Achieving true move semantics is not possible. As a result, the window thread might eventually have multiple distinct handles referencing internally the same underlying worker-thread-owned MediaSource. It falls to the implementation to prevent simultaneous attachments (during the asynchronous media element load algorithm) from proceeding successfully, allowing at most 1 of them to attach successfully. tldr: Keeping the [[IsAlreadySerialized]] proposed internal slot is not sufficient to prevent copies, but does help early detection of some of those scenarios. The media element load algorithm must fail if the same underlying MediaSource is already not 'closed'. The latter isn't a change to the spec. Overall, Serializable is simpler ergonomically than Transferable, especially when there is potential for underlying broadcast or lack of true move semantics.
Browser implementations are free to use different approaches if they can produce the same observable behavior. Extensions can modify behavior in any way the browser lets them, if they are prepared to expect consequences, so browsers can't expect to maintain interoperable behavior in the presence of arbitrary extensions.
But what behavior would content see from passing a MediaSourceHandle
to BroadcastChannel#postMessage()
, if it were serializable?
While merely transferable and not serializable, MediaSourceHandle
cannot be transferred through BroadcastChannel
, which is consistent with transfer of MessagePort
. "transferring does not make sense in multi-destination situations." MessagePort
is similar to MediaSourceHandle
in that it is a transferable object used for two-directional message-passing communication between threads and processes.
browsers can't expect to maintain interoperable behavior in the presence of arbitrary extensions.
As I understand it, all major browser implement the same behavior for message posting, which is broadcasting to all content script realms.
In WebCodecs we wanted to be able to have move semantics for VideoFrame
but this detail made it impossible to implement in practice.
The situation with MediaSourceHandle
is simpler because the resources it holds are less scarce, but I do want to recommend against inventing a new model for object ownership. If you don't follow the JS-like path (cloneable objects that are GC'd), there are surprising roadblocks.
But what behavior would content see from passing a MediaSourceHandle to BroadcastChannel#postMessage(), if it were serializable?
I would recommend that each receiver gets a valid handle, and that all valid handles are invalidated when one is attached.
While merely transferable and not serializable, MediaSourceHandle cannot be transferred through BroadcastChannel, which is consistent with transfer of MessagePort.
I don't think this is an option. Transferable
isn't its own concept (or even an IDL attribute any more), it's a detail of the structured clone algorithm. If you can't serialize it, you can't transfer it either.
If there is a way to accomplish this combination, note that it would also prevent passing these handles through a TransferableStream
.
I would recommend that each receiver gets a valid handle, and that all valid handles are invalidated when one is attached.
I agree. Further, for a particular handle instance, if it has ever been assigned to an HTMLMediaElement srcObject (even if the subsequent asynchronous load was interrupted and the underlying MediaSource hadn't ever been attached successfully), then that handle instance can never be successfully serialized again.
However, I think instantaneous serialization prevention of any handle instance in any realm with the same underlying MediaSource (such multiple instances can happen if extensions/broadcasts cause clones of the original single handle retrieved from MediaSource.getHandle()) upon any of those handles ever being assigned as srcObject to an HTMLMediaElement would risk security (could enable malicious high-resolution timer attacks).
So I think the following may be the best approach. If we find later that these restrictions can be relaxed, we can update the spec and implementations further:
As I understand it, all major browser implement the same behavior for message posting, which is broadcasting to all content script realms.
That sounds very inefficient, especially for something large like a VideoFrame
, at least unless there's shared memory access involved. But efficiency is much less an issue for transferring MediaSourceHandle
.
In WebCodecs we wanted to be able to have move semantics for
VideoFrame
but this detail made it impossible to implement in practice.
I would guess that ReadableStream.tee()
support requires a Serializable object. We don't need that for MediaSourceHandle
.
I do want to recommend against inventing a new model for object ownership. If you don't follow the JS-like path (cloneable objects that are GC'd), there are surprising roadblocks.
There is no new object ownership model proposed here.
MediaSourceHandle
can be GC'ed without any need to be clonable/Serializable.
While merely transferable and not serializable, MediaSourceHandle cannot be transferred through BroadcastChannel, which is consistent with transfer of MessagePort.
I don't think this is an option.
Transferable
isn't its own concept (or even an IDL attribute any more), it's a detail of the structured clone algorithm. If you can't serialize it, you can't transfer it either.
Transferable objects already exist without any need to be Serializable: MessagePort
, ReadableStream
, WritableStream
, and TransformStream
are all examples of objects that are Transferable, but not Serializable. If browsers can implement these objects, then I don't see any reason why a Transferable and not Serializable MediaSourceHandle
would be a problem.
All these objects serve a similar purpose to MediaSourceHandle
in that they provide potentially cross-realm communication with at most one associated other object. (tee()
is an explicit extension of this involving additional objects.)
VideoFrame
is very different. VideoFrame
s are data, not communication portals (end-points).
A sensible way to implement and even spec MediaSourceHandle
would be that it hold a MessagePort
created when MediaSource.getHandle()
is called, to be used as [[port to worker]] on HTMLMediaElement
.
If there is a way to accomplish this combination, note that it would also prevent passing these handles through a
TransferableStream
.
What do you mean by TransferableStream
? If you can edit and link to a spec, that might be helpful, please?
We don't have any need to pass MediaSourceHandle
s through ReadableStream
, etc.
MediaSourceHandle
s are not "created, processed, and consumed in an incremental fashion", they are not transferred in large numbers, and they do not need backpressure.
There is no new object ownership model proposed here.
MediaSourceHandle
can be GC'ed without any need to be clonable/Serializable.
I'm saying that there is a specific behavior that JS objects have, and that adding new objects that behave differently is fraught. In particular, not being serializable is incompatible with every API except for postMessage()
, and is less ergonomic for postMessage()
.
I agree that there is no specific relationship with GC, but there is a conflict with "transferable only"; the agent will need to keep a copy for each destination realm, and so the serialized message itself will likely be GC'd. If you "transfer out" of the serialized message then it becomes "first recipient wins", which isn't necessarily going to be the intended context.
Ultimately this is a question of style. I say that since move semantics can't be done right, it's better not to add the complications. It's reasonable to prefer the hacks required to partially simulate move semantics, though.
Transferable objects already exist without any need to be Serializable:
MessagePort
,ReadableStream
,WritableStream
, andTransformStream
are all examples of objects that are Transferable, but not Serializable. If browsers can implement these objects, then I don't see any reason why a Transferable and not SerializableMediaSourceHandle
would be a problem.
It's not serializable only in the sense that the serializer checks that it's being transferred and throws an error if not. It's still being serialized, just with an extra hack on top.
I'm not actually sure that browsers can correctly implement these. As an example, Chrome's ReadableStream
deserialization implementation will create and entangle multiple copies of the destination port, and I'm not really sure what that even means. One implication seems to be that a transferred MessagePort
will duplicate objects in the transfer list (which I guess is exactly what I'm already saying is possible but via a different mechanism).
VideoFrame
is very different.VideoFrame
s are data, not communication portals (end-points).
A VideoFrame
is a handle to an underlying resource. They are data-like, but the expectation is that they come from limited buffer pools.
What do you mean by
TransferableStream
? If you can edit and link to a spec, that might be helpful, please?
I just mean a ReadableStream
that has been transferred.
We don't have any need to pass
MediaSourceHandle
s throughReadableStream
, etc.MediaSourceHandle
s are not "created, processed, and consumed in an incremental fashion", they are not transferred in large numbers, and they do not need backpressure.
That's true, although I'm not sure that it's enough reason to actively prevent using such APIs. It seems like there could be cases where some developers would find streams to be convenient, even if they are not always necessary.
During yesterday's Media Workgroup meeting, @jernoble attended and provided the following input after some discussion:
I don't have a strong opinion. From a high level point of view, Transferable makes sense as the semantics match … If it's unimplementable, then fine. I'm a little worried that we'll face resistance if we need to change it later, if handles need to be both Transferable and Serializable … I could pull in some of the JavaScript people for input
@karlt : I've landed in Chromium a "Serializable-only" implementation of this, following the precedent and advice from @sandersdan and the issues he encountered when trying to implement true move-semantics using Transferable for VideoFrame. Given those issues' details earlier in this thread, it seems a Transferable-only MediaSourceHandle is impractical to implement: the unfortunate underlying mechanisms for Transferable don't appear to support move semantics in all cases without hacks, and even then the mechanisms still perform serialization. With simple internal state, such as [[has_ever_been_used_as_srcObject]] and [[has_already_been_serialized]], the proposed serialization steps for MediaSourceHandle can throw exception on attempts to serialize a MediaSourceHandle that has previously or currently been assigned as srcObject on an HTMLMediaElement, or if the app attempts to serialize the handle instance more than once. Would you be ok with Chrome shipping this kind of MSE-in-Workers using Serializable MediaSourceHandle that is not Transferable (and, per previous discussion, that doesn't support worker MediaSource attachments via an object URL)?
I agree that there is no specific relationship with GC, but there is a conflict with "transferable only"; the agent will need to keep a copy for each destination realm, and so the serialized message itself will likely be GC'd. If you "transfer out" of the serialized message then it becomes "first recipient wins", which isn't necessarily going to be the intended context.
That's fine when there is only one recipient. i.e the Transferable-only case.
Ultimately this is a question of style. I say that since move semantics can't be done right, it's better not to add the complications.
Move semantics can be done right with Transferable without Serializable.
It's reasonable to prefer the hacks required to partially simulate move semantics, though.
The proposal here appears to be for move semantics on serialization, but no move semantics on deserialization. Why would it be reasonable to have hacks for move semantics that are not done right, when it is not reasonable to have move semantics done right with Transferable-only?
The only argument for this appears to be the calling convention of postMessage()
, but that doesn't justify introducing a broken sometimes-move system. A wrapper function around postMessage()
would be simple to implement if a simpler calling convention is desired.
Transferable objects already exist without any need to be Serializable:
MessagePort
,ReadableStream
,WritableStream
, andTransformStream
are all examples of objects that are Transferable, but not Serializable. If browsers can implement these objects, then I don't see any reason why a Transferable and not SerializableMediaSourceHandle
would be a problem.I'm not actually sure that browsers can correctly implement these. As an example, Chrome's
ReadableStream
deserialization implementation will create and entangle multiple copies of the destination port, and I'm not really sure what that even means. One implication seems to be that a transferredMessagePort
will duplicate objects in the transfer list (which I guess is exactly what I'm already saying is possible but via a different mechanism).
MessagePort
transfer is fundamental, and I assume any browser's IPC system would have been designed to support that. ReadableStream
just builds on MessagePort
.
Are you claiming that a MessagePort
A, entangled with one other MessagePort
B at one point in time, can somehow become entangled with more than one other MessagePort
through cloning of MessagePort B? Can you give an example of a situation where this can be observed?
VideoFrame
is very different.VideoFrame
s are data, not communication portals (end-points).A
VideoFrame
is a handle to an underlying resource. They are data-like, but the expectation is that they come from limited buffer pools.
Right, but VideoFrame
would not be used as a model for MediaSourceHandle
, which is not data-like.
That's true, although I'm not sure that it's enough reason to actively prevent using such APIs. It seems like there could be cases where some developers would find streams to be convenient, even if they are not always necessary.
The intended audience is expected to have a postMessage()
available. ReadableStream
is not so much actively prevented, but ruled out because it is unsuitable.
With simple internal state, such as [[has_ever_been_used_as_srcObject]] and [[has_already_been_serialized]], the proposed serialization steps for MediaSourceHandle can throw exception on attempts to serialize a MediaSourceHandle that has previously or currently been assigned as srcObject on an HTMLMediaElement, or if the app attempts to serialize the handle instance more than once.
Right, but as I think you are acknowledging, that doesn't prevent multiple MediaSourceHandle
s, each identifying the same MediaSource
, each having [[has_already_been_serialized]] false, and being attached to different HTMLMediaElement
s, from successfully running the "Attaching to a media element" steps concurrently while MediaSource#readyState
is "closed".
The motivation for Transferable is that it provides a clear way to ensure that there is at most one HTMLMediaElement
attached to a MediaSource
. Without that, I don't know that incomplete move semantics from [[has_already_been_serialized]] etc. are much value. You are right that the exceptions thrown are easier to remove later than add in, but they don't support moving to Transferable in the future because the postMessage()
calling convention is different.
Without MediaSourceHandle
providing the single HTMLMediaElement
-MediaSource
entanglement, another mechanism is required.
A single MediaSource
can receive two tasks from the "Attaching to a media element" steps to set [[port to main]]. Is the second rejected with an error sent back to the HTMLMediaElement
that lost the race?
Would you be ok with Chrome shipping this kind of MSE-in-Workers using Serializable MediaSourceHandle that is not Transferable (and, per previous discussion, that doesn't support worker MediaSource attachments via an object URL)?
I wouldn't choose to ship this design myself in another browser given the reasons presented here. The reasons given why Transferable is difficult in Chrome are vague. The appearance is that there has been a lot of effort put into a system that doesn't work because the one that does work has not be investigated fully.
However, even a Serializable MediaSourceHandle
is much better than object URLs. Thank you for all your work to put this in place.
I'd like to be sure that implementation is possible without serious inefficiencies (such as broadcasting to every realm) or leaks.
While a serialization of a MediaSourceHandle
exists and may yet be deserialized, the identified MediaSource
must be marked in-use (not garbage).
This ownership pattern of serializations of clonable handles that keep a referenced object alive across realms is similar to that of SharedArrayBuffer
, but SharedArrayBuffer
is limited to a single agent cluster.
Is the intention that MediaSourceHandle
use is limited to a single agent cluster? If not, can you comment on how you addressed the ownership/lifetime of the MediaSource
in your implementation, please?
I expect a design limited to a single agent cluster would cover the important use cases, but this Serializable MediaSourceHandle
design appears awkward to extend to multiple clusters. (I guess it would be something like sending ref-counting messages on each deserialization etc.) Do you think we can rule out the need for multiple agent clusters or have you planned for that?
Does Chrome's VideoFrame
implementation implement reference counting across agent clusters (and processes), or does it just copy the readonly data?
That's fine when there is only one recipient. i.e the Transferable-only case.
Move semantics can be done right with Transferable without Serializable.
This is exactly what I am saying is not the case.
There can be more than one recipient (in practice frequently are more, but most of the receivers will ignore the clones), and the implementation has to handle that (literally Chrome's implementation copies GC pointers because it can't use unique ownership).
There are a few different hacks available to simulate Transfer-only, but in the corner cases you have to pick some specific behavior. It's not good enough to handle only the simplest path, even if that's what most developers will think is the only path.
Are you claiming that a
MessagePort
A, entangled with one otherMessagePort
B at one point in time, can somehow become entangled with more than one otherMessagePort
through cloning ofMessagePort
B? Can you give an example of a situation where this can be observed?
Yes, that's exactly what I am saying.
I created a demo using an extension: https://drive.google.com/file/d/1nsyawXUBFGt10BX3Gv0OAfWO4PzgVLF2/view?usp=sharing.
I tested in both Chrome and Firefox with the same result.
TL;DR: (1) A postMessage()
transfers a MessagePort
to an iframe
. (2) The iframe
and the extension content script both receive the port, in different contexts, and use it to send a message. (3) The main page logs both messages, "hello from iframe" and "hello from extension".
[Edit: in Chrome I'm confident that this goes through entangling more than once. I am not familiar with the Firefox implementation, it's conceivable that it instead has one C++ object with multiple JS wrappers.]
The intended audience is expected to have a postMessage() available. ReadableStream is not so much actively prevented, but ruled out because it is unsuitable.
I'm not sure I follow why it is unsuitable, other than a preference for Transfer-only behavior.
Does Chrome's
VideoFrame
implementation implement reference counting across agent clusters (and processes), or does it just copy the readonly data?
The implementation is locked to the agent cluster, it will not cross processes. If there were demand for that feature, we would implement it with copying, like ImageBitmap
does. I don't think there is demand for that, but it would also enable storing them in IndexedDB
which might be useful for VideoFrame
.
With simple internal state, such as [[has_ever_been_used_as_srcObject]] and [[has_already_been_serialized]], the proposed serialization steps for MediaSourceHandle can throw exception on attempts to serialize a MediaSourceHandle that has previously or currently been assigned as srcObject on an HTMLMediaElement, or if the app attempts to serialize the handle instance more than once.
Right, but as I think you are acknowledging, that doesn't prevent multiple MediaSourceHandles, each identifying the same MediaSource, each having [[has_already_been_serialized]] false, and being attached to different HTMLMediaElements, from successfully running the "Attaching to a media element" steps concurrently while MediaSource#readyState is "closed".
The motivation for Transferable is that it provides a clear way to ensure that there is at most one HTMLMediaElement attached to a MediaSource. Without that, I don't know that incomplete move semantics from [[has_already_been_serialized]] etc. are much value. You are right that the exceptions thrown are easier to remove later than add in, but they don't support moving to Transferable in the future because the postMessage() calling convention is different.
[[has_ever_been_used_as_srcObject]] is very useful to specify the behavior that should happen if a particular MediaSourceHandle
instance has been assigned ever to srcObject
on an HTMLMediaElement
and the app then tries to serialize it later. Even if transferable were attempted instead of serializable, this would still be my preferred approach, since it would immediately prevent the app from taking the handle away from a media element's execution context where the handle (and the underlying MediaSource in the worker) are either still pending possible asynchronous attachment start, are still attached, or were previously attached.
[[has_already_been_serialized]] is a helpful, but not a fully constraining bit of information that helps reduce (but cannot fully prevent) duplications on dispatch (e.g. via an implementation using broadcast internally for MessageChannels, or an implementation duplicating messages to send them to extensions). I'm fine with dropping it if it is the sticking point on this API, but I do believe it has some value in signalling the intent to applications that explicitly repeating attempts to serialize a handle are not expected to be allowed (or even sending a message with the same handle in it more than once).
Without MediaSourceHandle providing the single HTMLMediaElement-MediaSource entanglement, another mechanism is required.
A single MediaSource can receive two tasks from the "Attaching to a media element" steps to set [[port to main]]. Is the second rejected with an error sent back to the HTMLMediaElement that lost the race?
This is a good implementation point: in Chromium implementation, since there is reliance on the constraint that a DedicatedWorker is in the same process (or more generally, in the same agent cluster) as the dedicated worker or window that created it, the implementation of spec draft for attachment steps implicitly takes a lock in the "attaching to a media element" steps in the worker MediaSource to see if it is closed and to attempt to open it. As such, the attachment attempt that loses the race to obtain that lock will see that the MediaSource is either already open (and loses), or was already opened and later closed but the MediaSource
attachment abstraction in the implementation prevents even the latter case from succeeding since it enforces maximum of one successful attachment per MediaSource
when that MediaSource
was created in a dedicated worker.
The spec draft therefore could use some improvement to clarify: Once a worker-owned MediaSource has ever been attached successfully, a re-attachment (e.g. re-load() on HTMLMediaElement with same handle as before, or perhaps a duplicated-via extension/broadcast/etc handle is used) of the MediaSource should fail with steps similar to the 'If readyState is NOT set to "closed"' attaching step.
Is the intention that MediaSourceHandle use is limited to a single agent cluster? ... this Serializable MediaSourceHandle design appears awkward to extend to multiple clusters.
@karlt (and also @jernoble since he had a question about expected behavior around this during this week's Media WG meeting): Yes, the intention is that MediaSourceHandle
use is limited to a single agent cluster. The handle references a data abstraction (MediaSource
) that, even with an objectURL approach, would still have been in the single agent cluster. PostMessage-ing a MediaSourceHandle
to a SharedWorker should fail similarly to attempt to do the same with a SharedArrayBuffer, and I've locally verified the experimental Chrome implementation of MediaSourceHandle
behaves this way. Internally, the mechanism in Chromium to signal this is during serialization, mark the record as locked to agent cluster
, and MediaSourceHandle
serialization does this (as does VideoFrame
already and others).
Do you think we can rule out the need for multiple agent clusters or have you planned for that?
Thanks to this discussion and @jernoble asking similarly about SharedWorker and MediaSourceHandle
, yes, we need to rule out the support for communicating and using a MediaSourceHandle
via or in multiple agent clusters.
Does Chrome's VideoFrame implementation implement reference counting across agent clusters (and processes), or does it just copy the readonly data?
@sandersdan can assist with this answer, I believe. My understanding is the serialization of a VideoFrame
in Chrome also locks the message to the same agent cluster as the transmitter, and the underlying representation is held in a scoped_refptr<> that only works within a process, not cross-process, though there may be some further complication due to cases where a VideoFrame might refer to an out-of-process backing store, such as that in gpu via another indirection like a handle.
Notably, the internal handle representation for the result of MediaSource.getHandle()
(even if subsequently potentially shared across multiple MediaSourceHandle
instances resulting from extension or broadcast duplication) is a refcounted object (scoped_refptr<>) which persistently keeps the MediaSource
from being GC'ed while the handle as the reference. Upon attachment start (e.g. the asynchronous task in HTMLMediaElement that follows from assigning the handle to srcObject), the media element takes the handle instance's internal refcounted attachment object, if it still has it (and it clears that field from the handle instance). If the media element finds the internal refcounted attachment object is not longer valid (e.g. serialized away already, where the internal refcounted object is moved during serialization, or if this handle has previously already begun attachment), then media load fails per the spec draft. Otherwise, the usual attachment start process begins, which itself detects races in attempted simultaneous attachment via the same internal refcounted attachment object to the underlying MediaSource
instance, as described above. If attachment is successful, then the scoped_refptr<> attachment object keeps persistent references to both the media element (in window context) and the mediasource object (in worker context) to prevent them from being GC'ed. Later detachment drops the media element's reference to the internal attachment object, and if there are no other references, this lets cleanup happen to the detached objects. Even in a Transferable implementation with true move semantics, most/all of this would remain the same - I'm just trying to provide helpful implementation notes around how I solved this in Chrome.
Would you be ok with Chrome shipping this kind of MSE-in-Workers using Serializable MediaSourceHandle that is not Transferable (and, per previous discussion, that doesn't support worker MediaSource attachments via an object URL)?
I wouldn't choose to ship this design myself in another browser given the reasons presented here. The reasons given why Transferable is difficult in Chrome are vague. The appearance is that there has been a lot of effort put into a system that doesn't work because the one that does work has not be investigated fully.
I'm not sure I understand the last sentence fully. The Serializable-only system does work, from what I can tell. The Transferable-only doesn't provide true move-only semantics from what I understand from @sandersdan. I do appreciate all of your thoughtful input - I'm trying to find the best way to navigate the advice from @sandersdan around complications that arose due to lack of true move-only Transferable semantics while enabling this highly-sought-after MSE-in-Workers API with explainable semantics with a minimum of shocking surprises to API users.
Are you claiming that a
MessagePort
A, entangled with one otherMessagePort
B at one point in time, can somehow become entangled with more than one otherMessagePort
through cloning ofMessagePort
B? Can you give an example of a situation where this can be observed?Yes, that's exactly what I am saying.
I created a demo using an extension: https://drive.google.com/file/d/1nsyawXUBFGt10BX3Gv0OAfWO4PzgVLF2/view?usp=sharing.
I tested in both Chrome and Firefox with the same result.
TL;DR: (1) A
postMessage()
transfers aMessagePort
to aniframe
. (2) Theiframe
and the extension content script both receive the port, in different contexts, and use it to send a message. (3) The main page logs both messages, "hello from iframe" and "hello from extension".[Edit: in Chrome I'm confident that this goes through entangling more than once. I am not familiar with the Firefox implementation, it's conceivable that it instead has one C++ object with multiple JS wrappers.]
Thank you for going to the trouble to create a demo.
The clone_message demo shows the behavior expected when the iframe and extension content script each see the same MessagePort
and postMessage()
on that MessagePort
.
I've extended clone_message to demonstrate that the extension content script sees the same MessagePort
as the iframe:
https://drive.google.com/file/d/1Xhd8hZTdwNWBvgj_nUh_GU-mTo5lgIm1/view
Only the extension content script calls start()
but the iframe receives a message when using only addEventListener()
. The behavior is the same in Firefox and Chrome.
There is apparently only one MessagePort
entangled with channel.port1
of index.html.
I don't think this is a sticking point, just an observation: BroadcastChannel
and ReadableStream.tee()
would behave differently should one of them be used to produce multiple clones of a Serializable MediaSourceHandle
:
ReadableStream
maintains a one-to-one mapping of serializations and deserializations, so only one MediaSourceHandle
would be usable for HTMLMediaElement
attachment, even if tee()
outputs are transferred or cloneForBranch2
is set for ReadableStreamTee()
.In an attempt to summarize my reasons for preferring Transferable without Serializable:
MediaSourceHandle
.
This appears to be generally agreed given attempts to add move-semantics where possible to Serializable.HTMLMediaElement
to a MediaSource
.The restriction to a single agent cluster, however, is also sufficient to avoid the race. HTMLMediaElement
is exposed only on Window
and there is "only one similar-origin window agent per browsing context agent cluster". Therefore all attempts to set a MediaSourceHandle
identifying the same MediaSource
will run on the same thread. The attachment that succeeds is the first attempted, which is clearly defined (and there is no need for a lock while checking whether the MediaSource
has previously been attached).
The single agent cluster also simplifies MediaSource
lifetime management, thank you for the details.
The arguments for a Serializable are
postMessage()
calls.ReadableStream
, etc. and BroadcastChannel
. BroadcastChannel
s in different agent clusters will receive a "messageerror" event, I assume. Streams transferred across agent clusters also have error handling for deserialization exceptions, but I'm not so familiar with that.My preference would still be for Transferable, but my key concern with Serializable was with the race, which has now been resolved by the limit to single agent cluster. I recognize that others have different preferences. I thank you for listening to my views and trust you to proceed with the direction you feel is best.
@karlt thank you for your continued clear analyses of this. With the implementation working in Chrome well now, using Serialization, and with the benefits of Transferable's move semantic clarity for web authors perhaps not being as reliable as expected (since duplication can occur), I think inclusion of non-normative notes describing usage of serializable handle and implementation guidance like race prevention (as we already need to do with multiple main-thread MediaSource objectURLs racing attachment for same MediaSource instance) would be sufficient to obtain good interop and not be surprising. Further, inclusion of single-agent-cluster requirement will need to be in the spec (along with informative text describing why), to help reduce further confusion.
While serializing a handle to a stream might work to communicate it, the expected usage would be via postMessage, not via streams. I'll see what the Chrome implementation does with such serialization today and update here.
(I'll ignore claims that Transferable isn't implementable as those seem out of scope for this WG, and should probably be up-leveled to WHATWG)
I think a precedent to follow here is VideoTrackGenerator, which has a generator.track
member that is Transferable.
The generator remains in the worker and its track
can be transfered to main for use as a srcObject
(sound familiar?):
// worker.html
const generator = new VideoTrackGenerator();
parent.postMessage({track: generator.track}, [generator.track]);
// main.html
const {data} = await new Promise(r => worker.onmessage);
video.srcObject = new MediaStream([data.track]);
This example would work in Chrome once it implements transferable MST (crbug 1288839).
This would seem to match the needs of MediaSource quite well, and ensure consistency in the web platform.
With that model in mind, I propose these changes to the PR:
[SameObject]
partial interface MediaSource {
[SameObject] readonly attribute MediaSourceHandle handle;
};
track
/handle's [[detached]]
state, which transitions once transfered with postMessage(,[])
.This should give more flexible, desirable, and less surprising behavior. Thoughts?
While serializing a handle to a stream might work to communicate it, the expected usage would be via postMessage, not via streams. I'll see what the Chrome implementation does with such serialization today and update here.
I tried serialization of a MediaSourceHandle across a transferred (from worker to main) ReadableStream: that worked. Then I added a further serialization on the main thread of that handle via a WritableStream whose sink then initiated the attachment: that also worked. I didn't encounter any surprises. Notably, this method of communicating a handle would not work if the MediaSourceHandle were Transferable-only.
@jan-ivar - while I certainly prefer consistency, I am wondering if this lack of a Transferable-only version of handle's communicability via a stream makes it worthy of being just Serializable.
@jan-ivar - while I certainly prefer consistency, I am wondering if this lack of a Transferable-only version of handle's communicability via a stream makes it worthy of being just Serializable.
This use case (communicating a handle via a transferred Stream) is not supported by Transferable-only. I tend to think supporting this use case is more important than remaining consistent with a precedent that does not support this use case.
2. The handle can be sent via
ReadableStream
, etc. andBroadcastChannel
.BroadcastChannel
s in different agent clusters will receive a "messageerror" event, I assume. Streams transferred across agent clusters also have error handling for deserialization exceptions, but I'm not so familiar with that.
Correct, from what I can tell:
A MediaSourceHandle enqueued in a ReadableStream that is itself transferred to a SharedWorker causes a DataCloneError when the SharedWorker attempts to read, e.g. const {done, value} = await reader.read()
. In Chrome, the exception message is "chunk could not be cloned". This appears to be expected per Streams API SetUpCrossRealmTransformReadable's step 4 (messageerror handler setup) along with the HTML spec definition of the messageerror event: "Fired at an object when it receives a message that cannot be deserialized"
Why is communicating a handle via a transferred stream important?
Why is communicating a handle via a transferred stream important?
Transferable streams are a basic communication mechanism in the web platform, giving web authors the utility of building ordered workflows on top of stream semantics. I could imagine a scenario where application control messages are queued into a cross-realm stream separately from the existing MessageChannel mechanism, including things like "setup media element to begin playing using this handle". Stream idioms, like cancellation/abort/adaptation on stream read speed/etc may better match application developer's needs. Of course, I don't expect streams to be the exclusive way to communicate handles - I suspect many cross-realm apps already may be structured around basic messaging for their workflow control - but supporting communication of a handle via a transferred stream does give them flexibility.
What about a scenario of sending another transferable stream in such an application control message? Catch 22.
I'm a big fan of streams, but getting them to parity with postMessage as a goal seems like something that should be up-leveled to WHATWG again.
If Transferable objects are no longer to be used, I'd like to hear that from higher up. Until then, I prefer applying the right design idioms we have today for new APIs. postMessage > blob URL.
A MediaSourceHandle enqueued in a ReadableStream that is itself transferred to a SharedWorker causes a DataCloneError when the SharedWorker attempts to read, e.g. const {done, value} = await reader.read(). In Chrome, the exception message is "chunk could not be cloned". This appears to be expected per Streams API SetUpCrossRealmTransformReadable's step 4 (messageerror handler setup)
This thread is getting long. Maybe file an issue with WHATWG streams to discuss it there? Is there any inherent reason why it couldn't work? Maybe it can be fixed.
If Transferable objects are no longer to be used
Just to clarity, the concept of Transferable
is totally fine. My only claim is that it doesn't guarantee single-instance (move) semantics. There are a few specs that have variously tried to retrofit that, but it just doesn't work well with the underlying JS machinery. I don't think the issue has ever been properly reviewed at a higher level.
(I have a further opinion that requiring Transfer
is unnecessarily limiting, but it's not the core thing I want to make sure you are aware of when writing the MediaSource spec.)
The WebCodecs VideoFrame
is in fact transferable; it operates on top of the clone semantics by also closing the source frame. From the sender's perspective it can be an ergonomic way to move a frame, but from the spec side we acknowledge that there can end up being multiple clones and define the semantics when that happens (in our case, closing one of those clones closes all of them).
I don't think the issue has ever been properly reviewed at a higher level.
Then it sounds like an issue at a higher level should be opened, and discussed there.
There seems to be two questions here 1) does it work, and 2) do we want it to work that way. Let's uplevel the first, and discuss the second.
The API design question is whether it's desirable semantics to "transfer" (move) ownership of a MediaSource with postMessage.
My limited understanding is that a MediaSource differs from VideoFrame in that the former can only be the source of a single sink (at a time, or ever?), whereas a VideoFrame can be shared and used by multiple sinks. Happy to be enlightened if this is not the case.
My limited understanding is that a MediaSource differs from VideoFrame in that the former can only be the source of a single sink (at a time, or ever?), whereas a VideoFrame can be shared and used by multiple sinks. Happy to be enlightened if this is not the case.
This is correct; the MediaSource situation is simpler that the WebCodecs one. The observable behavior of the objects is quite different, but the ownership situation ends up similar simply because VideoFrames
are a scarce resource.
I have no strong reason why not Transferable for MediaSourceHandle. I was following the technical advice around issues related to assumed (but not actual) move semantics for Transferable objects.
Certainly in Chrome implementation, I can try a proof of concept that allows MediaSourceHandle to be transferable (and it would then be a simple change, though unfortunate for users currently desiring this ship ASAP, to require handle transfer. See [note] below.)
Note that I don't think the following would work though:
postMessage(null, [handle])
, since there is no field in the received MessageEvent for handles (or video frames etc, except ports. Instead, the following would work: postMessage(handle, [handle])
Going that route, would Transferable-only be consistent with precedent? Or should the following also work:
postMessage(handle)
(Note that the item being transferred or serialized is a handle, not the actual MediaSource. Conceptually, the spec can ensure that at most one handle can succeed in attaching to a worker-owned MediaSource, and that once attached successfully, implementations will need to be able to prevent leakage of the cross-realm internal links from potential handle copies. This iis already in the Chrome prototype of serializable handle, and would still be needed in a Transferable version - regardless of whether transfer were required or just allowed.)
[note] If both a serializable and a transferable version would be acceptable, then that would allow more rapid use of the current MSE-in-Workers Chrome implementation, provided they only use serializable initially. Later expansion to also support transfer could be done. But if instead, transfer of the handle is required, then that would unfortunately delay initial use of the Chrome implementation.
Thanks for considering this in spite of implementation challenges and deadlines. I think it's important we get the API right.
I think the two meaningful semantic models on the table for this WG are:
srcObject
are first come, first servepostMessage(x, [x])
Transfer seems redundant with multiple handles. The feature of single handle is its restriction.
The argument for restriction would be a single handle fits MediaSource being single-use. Giving JS only a single handle to manage would be KISS and lead to fewer bugs (with less power comes less responsibility). E.g. thread races become impossible.
An argument for multiple handles being KISS is it defers hand-off to srcObject
assignment time, which has some (same thread) precedent, but at the cost of allowing thread races.
Going that route, would Transferable-only be consistent with precedent?
That's a bit complicated to answer, since both VideoFrame and MediaStreamTrack need to relinquish (hardware) resources, which is why they allow both transfer and serialization (whose side-effect is requiring JS to close each clone to relinquish the underlying resource).
Thanks for considering this in spite of implementation challenges and deadlines. I think it's important we get the API right.
I think the two meaningful semantic models on the table for this WG are:
- Multiple handles — serializable references whose use in
srcObject
are first come, first serve- Single handle — not serializable, and can only be explicitly handed off through
postMessage(x, [x])
Transfer seems redundant with multiple handles. The feature of single handle is its restriction.
The argument for restriction would be a single handle fits MediaSource being single-use. Giving JS only a single handle to manage would be KISS and lead to fewer bugs (with less power comes less responsibility). E.g. thread races become impossible.
An argument for multiple handles being KISS is it defers hand-off to
srcObject
assignment time, which has some (same thread) precedent, but at the cost of allowing thread races.Going that route, would Transferable-only be consistent with precedent?
That's a bit complicated to answer, since both VideoFrame and MediaStreamTrack need to relinquish (hardware) resources, which is why they allow both transfer and serialization (whose side-effect is requiring JS to close each clone to relinquish the underlying resource).
There is another dimension to the semantic models: handle reuse. In the Chrome prototype and proposals earlier in this PR, essentially only a single handle could ever be provided by the MediaSource
and whichever (if any) media element begins attachment first with it wins. Re-loading with that same handle would not work, even sequentially on same element. This is very similar to lack of reuse of an autorevoked object URL in same-thread MSE model, except that only a single handle per worker MediaSource would be provided. In my mind, this allows for simpler implementation without burdening or complicating the API. It also doesn't lock the spec or API into always preventing serial re-use of a handle (relaxing that shouldn't necessarily be a breaking change later, if that is desired); similarly, it doesn't lock the spec or API into always requiring a single handle per worker MediaSource instance (likewise, such relaxation shouldn't necessarily break apps later.)
I think Single handle, not reusable serially or concurrently to (re)load/(re)attach to a media element, makes sense for now. Apps can easily create more MediaSource instances and obtain their handles as needed.
I'll proceed with the Transferable-only prototype for a handle and see how implementable it is. If you feel this is the wrong direction, please respond ASAP to prevent further delays.
@karlt @jernoble @jan-ivar - I think I have a working prototype of Transfer-only, Single-handle, Single-use MSE-in-Workers that will land hopefully soon in Chrome 105 milestone behind experimental flags: --enable-blink-features=MediaSourceInWorkers,MediaSourceInWorkersUsingHandle
(or just --enable-experimental-web-platform-features
). Change that switches from Serializable-only to Transferable-only is in review now at https://chromium-review.googlesource.com/c/chromium/src/+/3733664.
Assuming this route has no objections, I'll be updating this spec PR soon.
I'm working on updating this PR now that Transferable-only MediaSourceHandle (+tests) and [SameObject] handle getter (+tests) are landed (former) or landing soon (latter) in Chromium's experimental implementation.
Notable changes that are needed to this PR:
-edited to add last list item
@mwatson2 @karlt @jernoble (and @jan-ivar) Please take a look at this updated PR. I've made significant changes to the original refinement PR, per our conversations. Experimental implementation of this is in Chromium M105 (as of 105.0.5180.0), and associated web-platform-tests have been added/updated.
Thank you all for your input!
@mwatson2 please take a look. @karlt @jan-ivar @jernoble please also take a look.
I'd like to merge this ASAP as I believe it addresses the concerns and the use case. Hopefully any further feedback is editorial in nature and could be addressed in later PRs.
I believe I have addressed all feedback. I'll squash and merge this PR shortly. Please file an issue if you see anything further needing correction once that merge has happened. Thank you, everyone, for your comments.
Current description of this PR: Overall intent for a
MediaSourceHandle
is that it should be [Transferable], and be restricted to "at most one" successful attachment to the underlyingMediaSource
by a handle. Further, if aWindow
MediaSource
was previously attached using a legacy MSE object URL, subsequent attempts to use thatMediaSource
instance'sMediaSourceHandle
for attachment must also fail.Changes
MediaSource.getHandle()
to be a[SameObject] readonly attribute
namedhandle
.Adds HTMLMediaElement.srcObject extension subsection.
Updates MediaSourceHandle transfer and related attachment text to more clearly indicate that transferring a handle prevents re-transfer of that instance due to the [[Detached]] internal slot's logic, and that transfer is also synchronously prevented if the handle instance's [[has ever been assigned as srcobject]] internal slot is true. Also adds multiple notes to describe intent and to clarify.
Updates the "attaching to a media element" text to indicate that a
MediaSourceHandle
can be attached successfully to at most one media element ever. Note that legacy MSE object URL attachments similarly already achieve the same intent in existing implementations. This change includes a clarification for MSE object URL attachments, since it was never the intent even for main-threadMediaSource
attachments for them to be successfully attached to more than oneHTMLMediaElement
at a time. The switch to more clearsrcObject
usage for worker MSE attachment affords the spec an opportunity to be more clear about this intent.This refinement originates from discussion on the previous PR with @karlt (https://github.com/w3c/media-source/pull/305) and from lengthy discussion on this change itself at https://github.com/w3c/media-source/pull/306.
This work is associated with the MSE-in-Worker spec issue: https://github.com/w3c/media-source/issues/175
Original description of this PR (updated more recently per https://github.com/w3c/media-source/pull/306#issuecomment-1183638693):
Adds text intending to help prevent using one or more
MediaSourceHandle
for the same underlyingMediaSource
for simultaneous, concurrent attachments to media elements:MediaSource.getHandle()
to succeed at most once perMediaSource
instance.MediaSourceHandle
transfer section to more clearly indicate that transferring a handle out of a context makes that handle unavailable for transferring again out of the same context (unless it is first transferred back to that context).MediaSourceHandle
can be attached successfully to at most one media element at a time, and with the previous 2 clarifications, ensures that the underlying worker contextMediaSource
can only be attached successfully to at most one media element at a time.Note that legacy MSE object URL attachments similarly already achieve the same intent in existing implementations. This change includes a clarification for those similar to the third bullet, but for a
MediaSource
in the "attaching to a media element" text, since it was never the intent even for main-threadMediaSource
attachments for them to be successfully attached to more than oneHTMLMediaElement
at a time. The switch to more clearsrcObject
usage for worker MSE attachment affords the spec an opportunity to be more clear about this intent.This refinement originates from discussion on the previous PR with @karlt: https://github.com/w3c/media-source/pull/305
This work is associated with the MSE-in-Worker spec issue: https://github.com/w3c/media-source/issues/175
Preview | Diff