API shape for creating a MediaStreamTrack from a WHATWG Stream

alvestrand commented 3 years ago

Two API shapes have been proposed:

MediaStreamTrackGenerator, which is a subclass of MediaStreamTrack: https://alvestrand.github.io/mediacapture-transform/#generator
VideoTrackSource, which is an object that has a MediaStreamTrack as an attribute: https://jan-ivar.github.io/mediacapture-transform/#video-track-source

Arguments on which to base a decision on which API shape to choose are needed.

jan-ivar commented 3 years ago

Note this issue is solely over shape: A subclass of track with a writable attribute, vs. a new interface object with a writable attribute and a track attribute:

const track = new MediaStreamTrackGenerator({kind: 'video'});
await readable.pipeThrough(transformer).pipeTo(track.writable);

vs.

const {writable, track} = new VideoTrackSource();
await readable.pipeThrough(transformer).pipeTo(writable);

Naming and exposure ([Exposed=DedicatedWorker] only, to reflect consensus) are orthogonal to this discussion.

jan-ivar commented 3 years ago

Audio exposure is also orthogonal (video-only for now to reflect consensus).

alvestrand commented 3 years ago

The pattern in the rest of the platform is "we generate a stream from a source":

getUserMedia() and getDisplayMedia() generates a stream from a hidden source object
HTMLMediaElmeent.captureStream() and HTMLCanvasElement() generates a stream from an explicit source object
AudioContext.createMediaStreamDestination() generates an object with a stream attribute from an explicit source object

I would like @guidou to comment, but I think the separation of the stream from the source object is a good pattern to be consistent about.

(Note: the name is just a name.)

(Note: [Exposed=DedidatedWorker] does not have consensus; it does have a trend - it's still not closed.)

guidou commented 3 years ago

CanvasCaptureMediaStreamTrack is precedent for using inheritance from MediaStreamTrack and having something representing the source as a field (i.e., the canvas field).

In principle, I'm OK with both approaches. One advantage of using inheritance is that it is pretty clear that the generator, which is a track, must be stopped. It is less clear that a track field represents an already live track (or is not live until it's accessed?) which must be stopped to stop the source. One advantage of having a field is that having clone() return an MST is clearer than in the inherited case. Another proposal we considered was to have a static factory function that returned two objects: the source and the track.

There isn't really a big difference between the three approaches and, in principle, I would go for the one more likely to lead to consensus. However, given that the inheritance-based MSTG has shipped on Chromium and a lot of production code has already been written and even more is currently being written against it, in my view, the bar for using something different is a lot higher now and I would personally oppose anything different unless it is shown to be categorically better.

jan-ivar commented 3 years ago

One advantage of using inheritance is that it is pretty clear that the generator, which is a track, must be stopped.

Why must it be stopped? This doesn't seem like an advantage to me.

But thanks for raising the precedent with canvas. However, following it, I think, would require fixing clone and renaming MSTG to GMST. Consider:

const [track1] = (await canvas.captureStream()).getVideoTracks();
const track2 = track1.clone();
console.log(track1.canvas === track2.canvas); // true

Semantically, track1 here isn't the source. canvas is. So following this precedent would lead me here:

const [track1] = new GeneratedMediaStreamTrack({kind: "video"});
const track2 = track1.clone();
console.log(track1.writable === track2.writable); // true

In other words, the writable is the source, not track1, and clone() returns a clone, not something else.

The cloning in the MSTG proposal seems problematic in this regard. Consider also https://github.com/w3c/mediacapture-main/issues/823.

alvestrand commented 3 years ago

The CanvasCaptureMediaStreamTrack is produced by HTMLCanvasElement.captureStream(), and contains a backreference to its canvas. There is no constructor for the CanvasCaptureMediaStreamTrack. So I don't think this is a precedent for creating subclasses of MediaStreamTrack with independent constructors.

guidou commented 3 years ago

One advantage of using inheritance is that it is pretty clear that the generator, which is a track, must be stopped.

Why must it be stopped? This doesn't seem like an advantage to me.

The track is live and must be stopped if you want it to stop using resources. Why would you not want to stop the track? You don't need to stop the source (it stops automatically when all tracks are disconnected). When the track is a field it is less clear that it is a live track that must be stopped to stop consuming resources.

But thanks for raising the precedent with canvas. However, following it, I think, would require fixing clone and renaming MSTG to GMST. Consider:
const [track1] = (await canvas.captureStream()).getVideoTracks();
const track2 = track1.clone();
console.log(track1.canvas === track2.canvas); // true
Semantically, track1 here isn't the source. canvas is. So following this precedent would lead me here:
const [track1] = new GeneratedMediaStreamTrack({kind: "video"});
const track2 = track1.clone();
console.log(track1.writable === track2.writable); // true
In other words, the writable is the source, not track1, and clone() returns a clone, not something else.

Correct, the writable is the source. I'm fine with making the clone be a MSTG where clone.writable == original.writable. I preferred the clone to be a MST instead of MSTG to avoid giving the impression that writable is a new writable, but given the CanvasCaptureMST precedent, I agree that it would be more consistent to make the clone an MSTG. We can make that change and be backwards compatible with practically all current code using MSTG.

The cloning in the MSTG proposal seems problematic in this regard. Consider also w3c/mediacapture-main#823.

As indicated above, we can easily fix it.

guidou commented 3 years ago

The CanvasCaptureMediaStreamTrack is produced by HTMLCanvasElement.captureStream(), and contains a backreference to its canvas. There is no constructor for the CanvasCaptureMediaStreamTrack. So I don't think this is a precedent for creating subclasses of MediaStreamTrack with independent constructors.

There is no precedent for creating tracks with a constructor, but CCMST is indeed a precedent for creating.a track using subclasses with a reference to its source. We could create it with a method instead of a constructor, but I don't see the advantage of doing that.

I reiterate that all these different surfaces are largely the same thing with very minor advantages and disadvantages and I would normally be fine with any of them. My point is that, since production code is written against MSTG, we should go with something different only if we see a significant advantage, and I don't see that advantage anywhere. I'm fine with making MSTG.clone() returning another MSTG to make it more consistent with CCMST since that wouldn't break any code.

Sorry, but we had 12 months to discuss these things and instead we wasted it arguing mostly about the incorrect claim that MSTP was buffering frames out of control. Now compatibility with existing production code is another variable that I think we should taken into account.

jan-ivar commented 3 years ago

The track is live and must be stopped if you want it to stop using resources.

What resources? Isn't the sole (re)source here the writable? Calling stop appears to indirectly close the writable, so if users close the writable they should be good.

Why would you not want to stop the track?

That's not the issue. I'm countering your claim that users MUST remember to call stop — and by extension countering your claim that not burying stop in a sub-object is a design advantage — or else resources somehow aren't relinquished, which doesn't appear accurate. Users bring their own source here, so as long as they abort or close that pipe, they should be good whether they've called stop or not. If they haven't yet started piping to the writable, there should be no need to do anything at all.

When the track is a field it is less clear that it is a live track that must be stopped to stop consuming resources.

Which means this doesn't matter.

guidou commented 3 years ago

The track is live and must be stopped if you want it to stop using resources.

What resources? Isn't the sole (re)source here the writable? Calling stop appears to indirectly close the writable, so if users close the writable they should be good.

Why would you not want to stop the track?

That's not the issue. I'm countering your claim that users MUST remember to call stop — and by extension countering your claim that not burying stop in a sub-object is a design advantage — or else resources somehow aren't relinquished, which doesn't appear accurate. Users bring their own source here, so as long as they abort or close that pipe, they should be good whether they've called stop or not. If they haven't yet started piping to the writable, there should be no need to do anything at all.

My point is that it's not clear that users need remember to stop something (either the track or the source). With the field approach it's no so clear if you have a live source and/or a live track once you create the object. If I create a VideoTrackSource, do I need to close its writable if I never access it? (e.g., if there is some other error in the application) If the writable is live without accessing it, is the track live as well? (I agree that this question is less important, but as mentioned above, it's not so clear that the source is live already).

With the inheritance approach, it is very clear that you get a live track and therefore a live source, and if you stop that track you also automatically stop the source (provided you didn't clone it). Very similar to what happens with a camera and the video track returned by getUserMedia().

When the track is a field it is less clear that it is a live track that must be stopped to stop consuming resources.

Which means this doesn't matter.

It matters in the sense that if you don't know if you have a live track you also don't know if you have a live source that needs to be closed.

This doesn't mean I think the field approach is bad, or even inferior. IMO, there is no actual functional tradeoff here, since the lower clarity I refer to is solved via documentation and behavior is the same provided the VideoTrackSource is live upon creation. This is a case where the decision comes down to preference as there is no real tradeoff.

In principle, I'm OK with both approaches, but since we have a lot of production code being written against the inheritance-based option, that makes me lean towards it.

guidou commented 3 years ago

The CanvasCaptureMediaStreamTrack is produced by HTMLCanvasElement.captureStream(), and contains a backreference to its canvas. There is no constructor for the CanvasCaptureMediaStreamTrack. So I don't think this is a precedent for creating subclasses of MediaStreamTrack with independent constructors.

Both the field and inheritance approaches result in a returned track upon calling a constructor. Is there a precedent for a new track returned as a field from an object created by calling a constructor? If not, which I think is the case, there is no precedent for either approach. Still I think a constructor is OK for both approaches.

alvestrand commented 2 years ago

The consensus documented is that we have a VideoTrackGenerator (and an AudioTrackGenerator if we decide to add audio), which has a MediaStreamTrack attribute. Fixed in #66.

w3c / mediacapture-transform

API shape for creating a MediaStreamTrack from a WHATWG Stream #70