MSE-in-Workers draft feature specification

wolenetz commented 3 years ago

Specifies the exposure of MSE API additionally to DedicatedWorkerGlobalScope. Updates, when MSE is in such a worker scope, cross-context communication between a Window HTMLMediaElement and a DedicatedWorker MSE instance to make it clear to implementors what is necessary for interop on this feature. Spec feature for MSE-in-Workers is #175

Note that Chromium already has an experimental implementation and there is a demo of the feature with instructions available at https://wolenetz.github.io/mse-in-workers-demo/mse-in-workers-demo.html

Preview | Diff

wolenetz commented 3 years ago

It looks like I need to fix a few spec-prod errs (div in dl, dl in ol, div in ol, div in ol, ol in dl) per https://github.com/w3c/media-source/runs/2947876193?check_suite_focus=true Please don't let those block review.

edit - it looks like 6df1c5d successfully fixes those sped-prod errors.

wolenetz commented 3 years ago

Looking into it some more, I wonder what are the benefits of specifying such a detailed message passing logic between the object in the worker and its mirror in Window.

Wouldn't it be enough to mention the need for implementation to mirror information between Window and worker objects using whatever implementation means they want (as alluded to in Cross-context communication model section) and keep algorithms somewhat unchanged except perhaps to note which information needs to have been mirrored and/or where the algorithm step runs?

The notion of "mirrored" can be interpreted differently in concurrent systems like the main/worker contexts in this feature. For clarity of what information coherency/latency flexibility is allowed, I included the "Cross-context communication model" and use of it (port to worker / port to main) in specific portions of the algorithms where the communication.

For instance, considering the following steps:

If the MediaSource was constructed in a DedicatedWorkerGlobalScope : Post an internal end of stream message to port to main whose implicit handler in Window notifies the media element that it now has all of the media data. Otherwise: Notify the media element that it now has all of the media data.

Why not keep it to what it was before the change?

Notify the media element that it now has all of the media data.

There is only one media element. It is clear that it is attached to the Window context. The term "notify" is vague, allowing arbitrary complex implementations. I do realize that the worker case is going to be more complex to implement but, from a spec perspective, I'm not clear why specifying the extra step that cannot be directly tested in practice is needed.

This example is a good one for your case. The "notify" is indeed vague due to legacy media pipelines and demuxers introducing some information latencies. I will revert the convoluted text specifically for this example. In contrast, the buffered and seekable extensions need more clarity because they are synchronous (on main thread) and reflect current information immediately, so clarify what the information latency limits here with the model and its usage seemed to me to be not only a good idea but required for better interoperability.

(There are likely other algorithms where it remains useful to clarify what is to happen, for instance to mention that visible VideoTrack objects attached to a media element are going to be mirrors of the ones in the worker)

Indeed, like the seekable and buffered extensions, the initialization segment received algorithm and removeSourceBuffer's management of tracks is IMHO made much clearer for interop by the detailed spec. Implementors are free to replace internal, unexposed items in the spec like 'update track state' messages with equivalent.

tl;dr: I wanted to clarify the expectations for better implementation interop.

tidoust commented 3 years ago

tl;dr: I wanted to clarify the expectations for better implementation interop.

Right and you shouldn't consider my comments as blocking in any way! I didn't spot anything wrong in the pull request.

The algorithms can be iterated over to drop bits that won't be observable in any way. These cases will typically appear when test cases get added to the test suite to assess interop between implementations of MSE in workers.

wolenetz commented 3 years ago

The algorithms can be iterated over to drop bits that won't be observable in any way. These cases will typically appear when test cases get added to the test suite to assess interop between implementations of MSE in workers.

Sounds good to me. I've added ee8fc38 to address the one known place where there was perhaps too much verbosity added.

wolenetz commented 3 years ago

077e233 is a squashed replacement of the previous chain of commits, to ease potential rebasing.

wolenetz commented 3 years ago

@mwatson2 please take a look. Known related work that I'd like to get landed soon, separately, is adding privacy and security sections, and especially (in response to a high-res timer attack concern raised in a response to my intent-to-experiment request in chromium/blink) clarify that the implementation's choice of cross-context communication mechanism needs to be resilient against such attacks (such as providing variable delay akin to postMessage, as the model already describes, if optimizations are made) for information such as MediaSource duration, live-seekable-range (as reported to HTMLMediaElement from worker) and recent error status (as reported to worker MSE from window HTMLMediaElement).

@tidoust please help verify that the changes in response to your comments are correct.

Thank you, Matt

tidoust commented 3 years ago

@tidoust please help verify that the changes in response to your comments are correct.

That looks good to me, thanks!

I note that the IPR bot complains because Google (and other companies) needs to re-join the working group following re-chartering.

mwatson2 commented 3 years ago

@wolenetz Looks good to me. My only comments are cosmetic suggestions - not technical - and could be done later, if done at all.

wolenetz commented 3 years ago

@ https://github.com/w3c/media-source/pull/282#issuecomment-89805456 (@mwatson2) - Thank you for your review. In general, I agree with your comments. The "potentially faster" text was included to ensure that at least no more than the latency of a postMessage() done by the app would be the limit. Otherwise, apps would need to take yet more care to understand the state of the attachment cross-context. In this regard, I'm also currently undoing some of the optimizations taken in the Chromium prototype of this feature, as they may enable Spectre-like exploits.

Since my AC rep hasn't yet rejoined the WG, I'll await until after my vacation (all of next week) to address comments (or land as-is and do a follow-up to address comments).

wolenetz commented 3 years ago

@https://github.com/w3c/media-source/pull/282#issuecomment-898054566 in the interests of more easily unblocking the follow-on commits that I'd been working on while pending rejoin to the WG, I'll land this as is and follow-up in later PR on your comments, above.

w3c / media-source

MSE-in-Workers draft feature specification #282