whatwg / html

HTML Standard
https://html.spec.whatwg.org/multipage/
Other
8.16k stars 2.69k forks source link

TextTrack provides no immutable list of changed cues when processing the event for TextTrack.oncuechange #3137

Open jimklo opened 7 years ago

jimklo commented 7 years ago

My use case is that I'm using a metadata TextTrack to coordinate the playback of graphical annotations for a video.

One of the challenges I've found is that synchronizing the rendering of the annotations during video playback is impossible to do at the TextTrack level using the TextTrack.oncuechange event handler. One of the problems is, not only does it not provide a pointer to the cues that triggered the event, access to the current active cues, TextTrack.activeCues, is a live mutable collection of some sort, in that one cannot make a frozen deep copy snapshot without a race condition occurring causing cues to possibly fall out of the TextTrack.activeCues collection before it can be fully enumerated into a copy.

It's recognized that an immutable list wouldn't correctly solve my issue - and I will acknowledge that the solution for my specific use case was found by using the TextTrackCue.onenter and TextTrackCue.onexit event handlers which provide a direct reference to the TextTrackCue that is currently made active/inactive.

In my use case, and I would suspect that for others as well, handing of the cue enter and exit events would be more useful at the track level, as I would suspect there are few tracks and possibly a few ways to handle cue events, but potentially hundreds if not thousands of cue events, each requiring a listener to be attached for handling. I'm not sure of the memory impact, but it seems that having to add references for each individual cue event could add up quickly.

I crudely monkey patched TextTrack to work how I needed it to work in a fashion similar to below. however I wonder however efficient in terms of maintaining synchronization.

TextTrack.prototype.origAddCue = TextTrack.prototype.addCue;
TextTrack.prototype.oncueenter = function(event) { };
TextTrack.prototype.oncueexit = function(event) { };
TextTrack.prototype.addCue = function(cue) {
  cue.onenter = (evt) => {
    this.oncueenter(evt);
  };
  cue.onexit = (evt) => {
    this.oncueexit(evt);
  };
  this.origAddCue(cue);
};

The other possible solution to this would be event bubbling. Unfortunately the enter and exit events do not bubble up. Attaching an event listener such as never receives an event:

TextTrack.addEventListener('enter', (evt) => { console.log("Enter", evt); });

Proposal:

I'd like to see the TextTrackCue enter and exit events bubble up the DOM such that they can be handled centrally. Currently this is either not a feature or it's a bug in the implementation. This would ensure that Cue events can be handled centrally without adding complexity.

Alternatively TextTrack could be expanded to include methods to get an immutable 'snapshot' of the current active cues. Certainly not the best solution, but a certainly a next best step, considering currently there is no mechanism to do this currently.

annevk commented 6 years ago

cc @whatwg/media

foolip commented 6 years ago

Also cc @fsoder

Thanks for filing the issue, @jimklo!

One of the problems is, not only does it not provide a pointer to the cues that triggered the event, access to the current active cues, TextTrack.activeCues, is a live mutable collection of some sort, in that one cannot make a frozen deep copy snapshot without a race condition occurring causing cues to possibly fall out of the TextTrack.activeCues collection before it can be fully enumerated into a copy.

Is this raciness something you have observed in practice? As defined and from what I know of the implementations in Chromium and WebKit, it should be impossible; the activeCues collection is only manipulated on the main thread, and any script that's running (to copy it) will block that.

So, I think that this should be a reliable way to get a copy of the active cues:

var track = video.textTracks[0];
var activeCuesCopy = [].slice.call(track.activeCues);

I'd like to see the TextTrackCue enter and exit events bubble up the DOM such that they can be handled centrally. Currently this is either not a feature or it's a bug in the implementation.

The reason it doesn't work is both because the event that's fired isn't a bubbling event, and that the owning TextTrackCue isn't included in the event path.

To make this work, we should define https://dom.spec.whatwg.org/#get-the-parent for TextTrackCue, so that TextTrack cue is considered a parent. Then, we could either make the events bubbling, or just say you have to use a capturing event handler.

Implementation-wise I think this would be pretty straightforward, and being able to listen to the same event on all cues seems like a nice convenience. Not sure about whether to bubble or not bubble though.

Any implementer interest?

@michaelchampion, we don't have anyone for EdgeHTML in @whatwg/media, do you know who might be able to react to this issue?

foolip commented 6 years ago

Implementation-wise I think this would be pretty straightforward

After taking a quick look at where event created are made in Chromium I'm not so sure anymore. The only existing things with event paths are indeed Nodes. Is this the case in all engines?

fsoder commented 6 years ago

I guess the "race" would be that "time marches on" can run between the tasks are queued for enter, exit and cuechange and the actual event dispatch happens. Teaching Blink (and likely other engines) about more general event paths is probably doable but probably a bit of yak shave. Delivering a list (FrozenArray?) of cues with oncuechange would likely be quite simple to implement, but of course comes with certain costs as well.

jimklo commented 5 years ago

Apologies for taking so long to respond... All my GH notifications were going off into the ether somewhere. And my project that brought about this issue ran out of funding 😢.

Also cc @fsoder

Thanks for filing the issue, @jimklo!

One of the problems is, not only does it not provide a pointer to the cues that triggered the event, access to the current active cues, TextTrack.activeCues, is a live mutable collection of some sort, in that one cannot make a frozen deep copy snapshot without a race condition occurring causing cues to possibly fall out of the TextTrack.activeCues collection before it can be fully enumerated into a copy.

Is this raciness something you have observed in practice? As defined and from what I know of the implementations in Chromium and WebKit, it should be impossible; the activeCues collection is only manipulated on the main thread, and any script that's running (to copy it) will block that.

So, I think that this should be a reliable way to get a copy of the active cues:

var track = video.textTracks[0];
var activeCuesCopy = [].slice.call(track.activeCues);

Yes the raciness is something that I observed in practice as I was dropping frames... Basically I was building a video annotation tool akin to what your Monday Night Football sportscaster might be using to annotate a play. Trying to render in response to the track.activeCues found events dropping (a command to clear annotations would drop so the annotations would just pile up). This is actually how I determined there was some kind of race condition... and wasn't reliable in that while I could slice track.activeCues, I was finding it not idempotent. I could repeat the same set of test and the length of the copy would differ, from what seemed workload dependent. I could exasperate the problem by using a setInterval(...) to read the track.activeCues watching array change contents while processing, frequently getting out of bounds errors. i.e. if there were more background tasks going on the copy would be shorter, than if the system was more idle.

The reality was that after much thought reacting against an immutable list doesn't work since I have no way to sync the actual start time of each cue event in the list at the appropriate time. Hence my monkey patch I provided enabled me to handle each event as they occur from TextTract.

jimklo commented 5 years ago

I guess the "race" would be that "time marches on" can run between the tasks are queued for enter, exit and cuechange and the actual event dispatch happens. Teaching Blink (and likely other engines) about more general event paths is probably doable but probably a bit of yak shave. Delivering a list (FrozenArray?) of cues with oncuechange would likely be quite simple to implement, but of course comes with certain costs as well.

Agreed. Hence after much thought - the idea of an immutable list (FrozenArray) wouldn't solve my specific problem, however there might be some use case that needs perform an action that is not synchronized to the cue, only in response such that having the list not change would be valuable.