Open guidou opened 1 month ago
cc @jan-ivar @youennf
What problem is this solving? Transferring the track has other benefits like being able to apply constraints and read track stats and settings.
It solves the problem that you don't need track transferability to implement this API, which we consider a blocker, at least for the medium term. Also, the benefits of transferring a track to a worker are very limited. In fact, I would argue that the only benefit would be using this API. You can call applyConstraints and read stats/settings on Window, where the main application is.
This proposal uses a pattern that we are already using in encoded transform and should easily allow us to have interoperable implementations.
One use case for transferring media stream track is to create a track (via VideoTrackGenerator) and send it to sinks like RTCRtpSender or MediaRecorder.
The proposal is that createVideoTrackGenerator() is called from Window and returns a promise with a MediaStreamTrack on Window, where the RTCRtpSender or MediaRecorder are. The generator (which no longer has a track field) is created in the worker (the application gets it via an event, just like an RTCRtpScriptTransformer).
This removes the need to transfer the track. You only needed to transfer the track from the worker to window because the current spec creates the track on the worker, where it is largely useless, as all the track APIs are on Window.
Another benefit of this API surface is that it allows feature detection on main without creating a worker.
This can be feature detected on main like this:
function isMstTransferable() {
try {
const [track] = document.createElement('canvas').captureStream().getVideoTracks();
new MessageChannel().port1.postMessage(track, [track]);
return true;
} catch (e) {
if (e.name != "DataCloneError") throw e;
return false;
}
}
It solves the problem that you don't need track transferability to implement this API, which we consider a blocker, at least for the medium term.
@guidou why is transfer a blocker? What do you mean by medium term? Safari has already shipped this and it works.
If you explain the problem, perhaps their engineers can help?
It's a blocker for Chromium to ship it in the short term since Chromium doesn't implement track transferability and will not have it for quite some time.
I don't expect Chromium to have track transferability in the short term, so I guess we won't have an interoperable API for a long time.
This can be feature detected on main like this:
function isMstTransferable() { try { const [track] = document.createElement('canvas').captureStream().getVideoTracks(); new MessageChannel().port1.postMessage(track, [track]); return true; } catch (e) { if (e.name != "DataCloneError") throw e; return false; } }
This feature-detects track transferability, not mediacapture-transform. With the current API you need to create a worker to feature-detect, which is costly and unergonomic.
I doubt that's needed. As you said, "tracks are useless on workers except for this API". If MST transfer is detected, it seems reasonable to assume some purpose awaits these tracks in the worker.
This works in the only current implementation: "WebKit for Safari 18 beta adds support for MediaStreamTrack processing in a dedicated worker."
This seems like a property worth emulating. I've added a note to Firefox's implementation bug to do the same. Thanks for bringing attention to this!
... Chromium doesn't implement track transferability and will not have it for quite some time.
If there's some difficulty or problem with the spec's transfer steps as specified, please bring it to our attention so we can address it.
You can call applyConstraints and read stats/settings on Window, where the main application is.
Yes, but waiting on postMessage for these measurements hardly seems ideal. In the current spec, the worker transform can inspect real-time track stats counters like deliveredFrames
, discardedFrames
and totalFrames
synchronously, and correlate them with the VideoFrame
it is currently processing.
You can call applyConstraints and read stats/settings on Window, where the main application is.
Yes, but waiting on postMessage for these measurements hardly seems ideal.
That goes the other way too if you want access to the track on Window (which is the more common case today).
In the current spec, the worker transform can inspect real-time track stats counters like
deliveredFrames
,discardedFrames
andtotalFrames
synchronously, and correlate them with theVideoFrame
it is currently processing.
I'm not opposed to supporting transferability. I'm opposed to making it a requirement to use mediacapture-transform, as that will have the practical consequence of delaying interoperable implementations.
We already have a pattern for adding worker support without requiring transferability of tracks or streams. This doesn't mean applications are forbidden from transferring tracks on browsers that support it if they want to. It just means that applications that don't need to transfer tracks to do processing (which are most if not all applications today) can more quickly have an interoperable API in practice.
That goes the other way too if you want access to the track on Window ...
No, because tracks can be cloned. With transfer, stats are readily available in both places. So the problem of a transformer needing a roundtrip to main to read settings and applyConstraints, for lack of transfer, would be new with this proposal.
I'm not opposed to supporting transferability.
Great! Since you said tracks are useless on workers except for the worker API, does this mean you support the worker API?
I'm opposed to making it a requirement to use mediacapture-transform, ...
It already is a requirement.
... as that will have the practical consequence of delaying interoperable implementations.
I doubt attempting to standardize a third new API and waiting for three implementations will get us to interop quicker.
Safari has shipped, and Firefox is working on it. 1½ < 3 + one WG. I've filed https://github.com/w3c/mediacapture-extensions/issues/158 to help.
Creating a permanent web API to solve one implementer's short-term scheduling seems against § 1.7. Add new capabilities with care and § 1.9. Leave the web better than you found it.
This doesn't mean applications are forbidden from transferring tracks on browsers that support it if they want to.
Having web developers navigate between 3 instead of 2 different APIs to do the same thing sounds worse, not better.
I'm not opposed to supporting transferability.
Great! Since you said tracks are useless on workers except for the worker API, does this mean you support the worker API?
What is the Worker API? I said tracks are useless on workers except for this API (i.e., mediacapture-transform) which artificially requires them. So, to clarify, there nothing I support about mediacapture-transform in its current form.
I'm opposed to making it a requirement to use mediacapture-transform, ...
It already is a requirement.
An artificial requirement. It would be very easy to have a spec that does not require track transferability for worker support. That also applies to new implementations (or updating existing ones), since the proposed approach is based on pre-existing patterns already implemented by all major browser engines.
... as that will have the practical consequence of delaying interoperable implementations.
I doubt attempting to standardize a third new API and waiting for three implementations will get us to interop quicker.
I also doubt an API that ignores developer requirements and concerns by at least one implementor will get us to interop.
Safari has shipped, and Firefox is working on it. 1½ < 3 + one WG. I've filed w3c/mediacapture-extensions#158 to help.
https://github.com/w3c/mediacapture-extensions/issues/158 does not address this issue.
Creating a permanent web API to solve one implementer's short-term scheduling seems against § 1.7. Add new capabilities with care and § 1.9. Leave the web better than you found it.
The specific change proposed in this issue is not about "short-term" scheduling. It is to make the API better. If a use case can be solved appropriately without introducing a dependency on another feature, then it is better to solve it without introducing that dependency. The fact that it results in an API that is easier to implement is a consequence of that design being better. Another benefit is that it allows easier feature detection on Window, more in line with 2.5. New features should be detectable than the current version of the API.
Ignoring the needs of web page authors and at least one user agent implementor, which the current API does overall, is directly against 1.1. Put user needs first (Priority of Constituencies). Ignoring concerns of user agent implementors also goes against 1.1. Put user needs first (Priority of Constituencies).
User needs come before the needs of web page authors, which come before the needs of user agent implementors, which come before the needs of specification writers, which come before theoretical purity.
The track transferability requirement is IMO the opposite of § 1.7. Add new capabilities with care. That principle refers to adding "new capabilities to the web with consideration of existing functionality and content". Adding a feature that requires a dependency on another feature is not better than adding the feature following existing patterns that don't require such dependency.
This doesn't mean applications are forbidden from transferring tracks on browsers that support it if they want to.
Having web developers navigate between 3 instead of 2 different APIs to do the same thing sounds worse, not better.
What 3 different APIs?
Are you referring to the requirement of using AudioWorklet for audio processing, which is a different API that, in addition, is not suitable for all types of processing?
I said tracks are useless on workers except for this API (i.e., mediacapture-transform) which artificially requires them.
This is not artificial, transferring a track to a worker has real benefits compared to the approach you mention. Let's take the example of a web application wanting to do background blur on a camera feed via a MediaStreamTrackProcessor and a VideoTrackGenerator.
First, lifetime management is easier.
When the VideoTrackGenerator track gets stopped, its WritableStream will be closed. The web application can listen to this via its closed promise and call stop
on the getUserMedia track.
Also, stopping the worker will kill both VideoTrackGenerator and getUserMedia track, housekeeping is simpler :)
This is less convenient when the WritableStream lives in a different context than the track, web developer will need to post message.
Second, configuration management. If the getUserMedia track is muted, the web app will likely want to mute the VideoTrackGenerator. Ditto when getUserMedia track is unmuted. If the getUserMedia track is in the same context as VideoTrackGenerator, it is very easy to implement for the web developer. Otherwise, web app has to postMessage.
This has a real user consequences: a few frames will likely be missed by VideoTrackGenerator when getUserMedia track gets unmuted if the web app has to postMessage. With the worker approach, missing frames would be a bug in the UA implementation.
The same principle applies to configurationchange
, getSettings
, applyConstraints
.
It is much easier for VideoTrackGenerator, MediaStreamTrackGenerator and getUserMedia track to be all in the same context to make use of these APIs.
Finally, we introduced MediaStreamTrack transferability as a way to cover some longer term use cases (grabbing camera in an iframe but do rendering/processing in another iframe). The current spec is more future-proof from that point of view as well.
User needs come before the needs of web page authors, which come before the needs of user agent implementors, which come before the needs of specification writers, which come before theoretical purity.
Right, I think user needs will likely be better served with the current API, as described above. I tend to agree that track transferability requires more work from UA implementors, but these costs are overweighted by user and web developer benefits.
I also doubt an API that ignores developer requirements and concerns by at least one implementor will get us to interop.
What are the developer requirements that have been ignored? So far, the developer feedback we received is that MSTP and VTG are working fine in Safari.
I said tracks are useless on workers except for this API (i.e., mediacapture-transform) which artificially requires them.
This is not artificial, transferring a track to a worker has real benefits compared to the approach you mention. Let's take the example of a web application wanting to do background blur on a camera feed via a MediaStreamTrackProcessor and a VideoTrackGenerator.
Yes, it is an artificial requirement.
If you have a use case where having the track in the worker is useful, then that can be very valid, but it doesn't justify making transferability it a requirement for mediacapture-transform. I didn't say track transferability is an artificial feature. I'm just saying it is an artificial requirement for mediacapture-transform that, in addition, is often detrimental.
First, lifetime management is easier. When the VideoTrackGenerator track gets stopped, its WritableStream will be closed. The web application can listen to this via its closed promise and call
stop
on the getUserMedia track. Also, stopping the worker will kill both VideoTrackGenerator and getUserMedia track, housekeeping is simpler :) This is less convenient when the WritableStream lives in a different context than the track, web developer will need to post message.
You don't need transferability as a requirement to support this use case. UAs that support track transferability can perfectly support this use case even if transferability is not a requirement for mediacapture-transform. BTW, I have never heard of this use case from actual developers, but I am not opposed to it.
Second, configuration management. If the getUserMedia track is muted, the web app will likely want to mute the VideoTrackGenerator. Ditto when getUserMedia track is unmuted. If the getUserMedia track is in the same context as VideoTrackGenerator, it is very easy to implement for the web developer. Otherwise, web app has to postMessage.
This has a real user consequences: a few frames will likely be missed by VideoTrackGenerator when getUserMedia track gets unmuted if the web app has to postMessage. With the worker approach, missing frames would be a bug in the UA implementation.
I don't think this is an actual problem because if the getUserMedia track is muted, it will produce no frames and the VideoTrackGenerator will see no frames. Still, if you want to support this use case in this manner, there is nothing in this proposal preventing it. Like I said, you don't need transferability as a requirement to support this. You just need transferability, which no one is opposing.
The same principle applies to
configurationchange
,getSettings
,applyConstraints
. It is much easier for VideoTrackGenerator, MediaStreamTrackGenerator and getUserMedia track to be all in the same context to make use of these APIs.
The same applies. I haven't heard developers request this, but even if it's a useful use case, you don't need transferability as a requirement to support this. You just need transferability.
Finally, we introduced MediaStreamTrack transferability as a way to cover some longer term use cases (grabbing camera in an iframe but do rendering/processing in another iframe). The current spec is more future-proof from that point of view as well.
That is an actual use case for which I have seen developer demand and IMO is the main value track transferability can provide. This is completely independent of having track transferability as a requirement for mediacapture-transform.
User needs come before the needs of web page authors, which come before the needs of user agent implementors, which come before the needs of specification writers, which come before theoretical purity.
Right, I think user needs will likely be better served with the current API, as described above.
User needs are not better served by having transferability as a requirement. Eliminating the requirement of transferability does not prevent any of the use cases mentioned before. For that you just need transferability, not transferability as a requirement for mediacapture-transform.
On the other hand, requiring transferability does make things difficult for some common use cases. The most obvious is playing both the gUM and VTG tracks on a video element in a before/after effects view. In this case you no longer have the gUM track on Window and therefore can't play it on an element. The same applies if you want to use any other track sink available only on Window.
I tend to agree that track transferability requires more work from UA implementors, but these costs are overweighted by user and web developer benefits.
All the arguments I've seen so far are benefits of transferability as a standalone feature. None of these benefits are derived from that transferability being a requirement for mediacapture-transform. Moreover, I've presented use cases where transferability as a requirement makes it more difficult to support common use cases.
So, transferability as a standalone feature supports both the use cases you presented and the ones I presented, but transferability as a mediacapture-transform requirement only supports the use cases you presented and fails to properly support the ones I presented. It is clear to me that removing the transferability as a requirement for mediacapture-transform better serves the needs of developers.
I also doubt an API that ignores developer requirements and concerns by at least one implementor will get us to interop.
What are the developer requirements that have been ignored? So far, the developer feedback we received is that MSTP and VTG are working fine in Safari.
Here are some developer requirements that are well known to us and which are ignored by the current version of the spec (not all of these are related to the issue we're discussing which is track transferability as a mediacapture-transform requirement):
Here are some developer requirements that are well known to us and which are ignored by the current version of the spec (not all of these are related to the issue we're discussing which is track transferability as a mediacapture-transform requirement):
Let's only talk about the requirements that are relevant to this particular issue (audio support and processing on window are out of scope).
- Keeping the gUM track on Window while processing on Worker (before/after view and other use cases that require track sinks available only on Window)
The before/after view can be implemented by transferring a clone of the track instead of transferring the track itself.
- Easy feature detection (without requiring the creation of a Worker)
@jan-ivar provided a feature detection approach that works in Safari (and will likely work in Firefox). This is good enough in practice, feature detection does not have to be technically pure.
I am sympathetic to the needs of browser implementors. So far though, I haven't seen new information that warrants revisiting the design of this API. Also, this API shape has mostly remained untouched for several years, probably since the spec first public working draft and reached consensus at the time within the WebRTC WG. This API shipped in one UA, and is being implemented in another UA.
The specific change proposed in this issue is not about "short-term" scheduling. It is to make the API better. If a use case can be solved appropriately without introducing a dependency on another feature, then it is better to solve it without introducing that dependency. The fact that it results in an API that is easier to implement is a consequence of that design being better.
I think this API is shortsighted. It's tightly coupled, artificially tied to main thread, and it reinvents postMessage.
Our goal is to enable MediaStreamTrack processing in dedicated workers. This might include MediaStreamTracks originating in the worker someday, e.g. from an OffscreenCanvas.captureStream() or other sources already exposed in the worker. Or an RTCDataChannel in a worker feeding a VideoTrackGenerator created there.
Since we all agree MediaStreamTracks will exist in workers eventually, the simplest API is the one that accepts them there.
The idiomatic way to get data to workers is with postMessage, using transferable objects if needed.
So I disagree we shouldn't depend on other web platform features. It's doing it all ourselves that's the mistake. At least that's how I read § 1.7 Add new capabilities with care.
I thought the plan was to summarize our positions in a separate issue and ask TAG for their opinion, but here's my reply.
The specific change proposed in this issue is not about "short-term" scheduling. It is to make the API better. If a use case can be solved appropriately without introducing a dependency on another feature, then it is better to solve it without introducing that dependency. The fact that it results in an API that is easier to implement is a consequence of that design being better.
I think this API is shortsighted. It's tightly coupled, artificially tied to main thread,
This API is not tighly coupled with anything. If a developer wants to transfer a track to a Worker and manage all its state and lifetime there, there is nothing in the proposed API preventing it. Just like nothing prevents developers from managing the track on Window if that is what they prefer.
The one that forces developers to use track transferability even if they'd rather not use it is the one that tightly couples two features that should be independent of each other.
and it reinvents postMessage.
This API does not reinvent postMessage anymore than encoded transform does. If this is such a bad thing, should I file an issue in encoded transform to eliminate the same pattern there and require that RTCRtpSender and RTCRtpReceiver (or some other object) be transferrable too?
Our goal is to enable MediaStreamTrack processing in dedicated workers. This might include MediaStreamTracks originating in the worker someday, e.g. from an OffscreenCanvas.captureStream() or other sources already exposed in the worker. Or an RTCDataChannel in a worker feeding a VideoTrackGenerator created there.
All this can be supported without tightly coupling mediacapture-transform with track transferability. I don't think it is possible to find a single use case where forcing the user to use track transferability is better than allowing it, but without forcing it.
Since we all agree MediaStreamTracks will exist in workers eventually, the simplest API is the one that accepts them there.
No it's not the simplest API. It is a lot more complex to tightly couple two features that should be independent. It is not only more complex for UA implementors, which cannot develop the features independently; but, more importantly, it is more complex for Web developers, who are forced to use an unnecessary feature and complex workarounds to solve otherwise nonexisting problems. Now Web developers are forced to clone a track, transfer one of them, and introduce track management logic in two separate realms even if their preference would be to do all track management on Window.
Even more importantly, the proposed API does not even need to be a replacement for the existing one. Since this API removes the tight coupling between both features, it isn't really much of a problem to provide the constructors in the existing API as a convenience for hypothetical applications that would prefer to do all track management in the worker.
The idiomatic way to get data to workers is with postMessage, using transferable objects if needed.
Again I ask, why is this a problem. Is encoded transform non-idiomatic? Should we eliminate the RTCRtpScriptTransform constructor and introduce a new transferable object there to be used with postMessage, or make senders and receivers transferable? Or is it a problem here, but not there?
So I disagree we shouldn't depend on other web platform features. It's doing it all ourselves that's the mistake.
The mistake is to force a dependency on another feature that should be independent.
I'd like to see a single use case where this dependency provides a benefit for web developers compared to having the features be orthogonal.
At least that's how I read § 1.7 Add new capabilities with care.
We read it very differently. Adding dependencies between features that should be orthogonal and forcing developers to use complex workarounds to deal with those unnecessary dependencies is, in my view, the opposite of adding capabilities with care.
@guidou I appreciate your efforts to simplify the API, but I believe your proposal introduces more complexity rather than reducing it. It seems unclear whether your proposed API is intended to replace the existing MediaCapture Transform API or to coexist alongside it.
If it's meant to coexist, then we're asking developers to navigate between multiple APIs that achieve similar goals, which can lead to confusion and fragmentation. This also increases the burden on browser implementers to support multiple APIs, delaying interoperability.
If it's meant to replace the existing API, it disregards the implementations already shipped in Safari and in progress in Firefox, which would fragment the ecosystem further and negate the developer feedback we've already received.
Moreover, your proposal doesn't seem to stand on its own because it doesn't cover all the use cases the current API does — particularly future scenarios where tracks originate in workers or need to be fully managed within a worker context.
Requiring track transferability isn't an unnecessary dependency; it's a design choice that provides significant benefits to developers, such as simplified lifetime and configuration management, as well as access to track stats and settings directly within the worker.
Adding another API also goes against the web platform design principles of keeping the platform consistent and avoiding unnecessary complexity.
I believe it's better for us to focus on implementing the existing API consistently across browsers and addressing any implementation challenges together, rather than introducing an alternative that could fragment the ecosystem.
@guidou I appreciate your efforts to simplify the API, but I believe your proposal introduces more complexity rather than reducing it.
Can you elaborate on how is it more complex? Especially for Web developers. One way would be to compare how intended use cases are implemented with each API.
It seems unclear whether your proposed API is intended to replace the existing MediaCapture Transform API or to coexist alongside it.
It can be both. I would prefer replace.
If it's meant to coexist, then we're asking developers to navigate between multiple APIs that achieve similar goals, which can lead to confusion and fragmentation.
It would be better to replace, but since there is one implementation, coexist seems acceptable.
This also increases the burden on browser implementers to support multiple APIs, delaying interoperability. It's not uncommon for an API to have multiple constructors or factory methods to serve different use cases.
In this case, I'm proposing factory methods that the provide the following benefits:
- It properly supports all use cases that have been presented so far. In particular it is better at supporting the common use case of an application wanting to manage the track Window and media processing on Worker.
- It removes the tight coupling between two separate API. This has the advantage that they can be developed independently by UA implementers.
If it's meant to replace the existing API, it disregards the implementations already shipped in Safari and in progress in Firefox, which would fragment the ecosystem further and negate the developer feedback we've already received.
For this reason, coexist would be acceptable.
Moreover, your proposal doesn't seem to stand on its own because it doesn't cover all the use cases the current API does — particularly future scenarios where tracks originate in workers or need to be fully managed within a worker context.
I'm talking about real use cases deployed in production right now, not hypothetical ones that might never exist. I believe the former should have more weight than the latter in the design of the WG's APIs.
Requiring track transferability isn't an unnecessary dependency;
Requiring track transferability for use cases that don't need it is indeed an unnecessary dependency.
it's a design choice that provides significant benefits to developers, such as simplified lifetime and configuration management, as well as access to track stats and settings directly within the worker.
There is no simplified lifetime and configuration management. Any use case that prefers to manage track lifetime on Window (basically all use cases deployed today) requires much more complex lifetime management with the existing API.
Adding another API also goes against the web platform design principles of keeping the platform consistent and avoiding unnecessary complexity.
Requiring track transferability for use cases that don't need it is precisely adding unnecessary complexity.
I believe it's better for us to focus on implementing the existing API consistently across browsers and addressing any implementation challenges together, rather than introducing an alternative that could fragment the ecosystem.
The ecosystem is already fragmented. This proposal might have the side effect of making it easier to reduce that fragmentation as it makes it possible for UA implementors to develop two features independently using patterns that are already implemented and tested. Forcing two features than can (and should) be orthogonal to have a dependency such that one has to be implemented before the other does nothing to help reduce the already existing fragmentation.
Finally, I think we have reached the point in which we are just repeating the same arguments without achieving consensus. Shouldn't we go ahead with the plan to file our positions in separate issues and get TAG's input?
Again I ask, why is this a problem. Is encoded transform non-idiomatic?
Encoded transform is bespoke.
Should we eliminate the RTCRtpScriptTransform constructor and introduce a new transferable object there to be used with postMessage, or make senders and receivers transferable?
No, because unique tradeoffs were involved, and that FPWD has already shipped in two browsers.
Or is it a problem here, but not there?
I believe it's on the person filing the issue to produce a convincing problem that needs fixing. Otherwise I see no new information since FPWD that warrants revisiting the design of this API.
Usage of the spec API seems fine, as seen in this blog.
The current version of the API requires track transferability, but this shouldn't be necessary. Currently, tracks are useless on workers except for this API, so we shouldn't add that as a requirement.
A way to keep the API worker first which has several benefits is to follow the postMessage-like approach of webrtc-encoded-transform.
Something (subject to discussion) like:
For MediaStreamTrackProcessor:
For VideoTrackGenerator: