Handle switching between sets of text tracks

heff commented 9 years ago

When switching to a new source (or reloading the same source) we need a solution for loading/reloading the set of text tracks. We may already have method we suggest so let me know if that's the case.

@dsparacio brought this to my attention. He's running into an issue with the integration between dash.js and video.js. Dash.js adds tracks to the video element using videoEl.addTextTrack. There is no videoEl.removeTextTrack, so when you reload the source or add a new source dash.js disables the existing tracks and creates new ones for the new tracks.

Video.js shows all tracks in the track menu, including disabled tracks, which in this case means showing tracks that no longer apply. It appears as multiple tracks for the same language, even though only the last is meant to be used. b2f5dc7a-4b1f-423a-9dbf-5277f13b0cb5

(side note, we need to make it more obvious that you can scroll down in menus like this)

He suggested ignoring disabled tracks. Here's my response and the rest of our discussion.

We can't really ignore disabled tracks in the menu because disabled is the only state where cues aren't firing. If a track had to be at least 'hidden' to show in the menu, then all tracks would be firing all cues as you scrubbed along the timeline (even if they're not being displayed), and the performance of that would get pretty terrible. Hopefully it doesn't come to that.

Ah this makes sense and now I see why you set track to disabled when switching from one to another in the menu. Thanks for explanation.

I could see a few other options... The first and probably easiest would be to de-dupe languages in the captions menu, and just show the last one loaded.

This would not work in the following case. Source one has three lang text track in it, source two only has two of the three previous source languages. Resulting in a “rogue" text track still being presented in source 2 that does nothing.

The second would be to have dash.js use a track element to add captions info, in which case it could be removed. Though I can't remember if you can modify the cues of a textTrack added via track element.

Currently we are using the addTextTrack call on the video element. Are you saying to manually create a TextTrackList, add textTrack(s) to that list then add List as child to video so that I can call removeChild later on?

Thirdly would be to figure out some mechanism for reusing cleared out tracks.

So I already tried this but unfortunately you can not set the language property on a text track once added. The only way to set the lang is as an arg of addTextTrack. This was my first approach until I found that lang was only read not read/write

...

Maybe I can mark tracks that are “garbage” with some value and videojs could ignore …. I am interested in the second option you purposed. I am going to look at this as well.

...

We have another use case that may cause more issues. Fragmented Text tracks. Currently with text tracks that externally load xml, we load all in MPD create a new textTrack per language, parse and add cues per lang and add the list to video element. This way the player knows of all the lang choices. But with fragmented text this is not the case. We need to add API in dash.js to return a list of available language tracks to display in a given player’s menu.

@gkatsev have you already figured this out? Otherwise can you add/remove cues from tracks that were created with track elements and vtt source files? That seems the most promising solution to me.

(other people who may be able to add to this: @imbcmdth @OwenEdwards)

gkatsev commented 9 years ago

Heh, @dmlap just opened #2367 today.

gkatsev commented 9 years ago

Actually, #2367 is a separate issue, though, tangentially related.

gkatsev commented 9 years ago

Cues can be removed, so, maybe we would just clear all the cues and if it has no cues and it is disabled, we ignore it from the menu? Using a track element to back the custom might not be a bad idea. It would mean that the tracks can actually be removed but also language and other values can be changed on the fly (by changing the track element's attributes). It would be really nice if we had a removeTextTrack method to go along with addTextTrack. I think we have more than enough usecases to take to the spec committees.

dsparacio commented 9 years ago

We currently remove all cues on tear down, just the texttrack itself can not be removed. If mode is showing the textTrackCue list will be empty on these text tracts. You can not set the Lang on text track once created. It is read only and can only be set with addTextTrack method. my first thought was to reuse the tracks and just reset attributes but not possible with current API.

dsparacio commented 9 years ago

One other challenge I need to solve is fragmented text tracks. I can populate the video element with empty cues in a textTrack to get video.js menu to display all the available lang options, but unlike external xml cc where all the cues are added upfront and video.js can handle the track switches from the UI menu, with fragmented text we need to change within dash.js so we need an API call or event from videojs when the track menu item changes. Is that currently available with Video.js. We need a solution that will work with all custom HTML5 players so event driven on change will be best.

gkatsev commented 9 years ago

It's possible that it will just be easier to create a track element and use its track object rather than using videoEl.addTextTrack().

When a text track is chosen from the menu, it just changes the mode of the associated text track to showing and changes the others of the same kind to disabled. You should be able to listen to the change event on the TextTrackList to know about this change.

heff commented 9 years ago

Yeah, in a previous issue I mentioned why I think removeTextTrack is excluded. Not that it wouldn't still be valuable, but just why it would be complicated for browsers to include it.

I also mentioned why I think the track element is the primary track API, instead of addTextTrack. I checked and you can add/remove cues from tracks that were created with track elements and vtt source files. So I agree with @gkatsev, I really think using track elements is the right approach here.

dsparacio commented 9 years ago

"You should be able to listen to the change event on the TextTrackList to know about this change."

I was hoping there was one but did not see it in w3c api list... I will try to do that now to see if it works. If there is an event when change or mode or texttrack that is all i need. I can do everything internally for fragmented text. Are you sure there is an event or just assuming?

gkatsev commented 9 years ago

videoEl.textTracks.addEventListener('change', ...)

@heff I just find it weird that they would give us the addTextTrack API which seems like what we should be using for things like the embedded text tracks but then not give us tools to clear it out between videos.

dsparacio commented 9 years ago

@gkatsev thanks for the link...

heff commented 9 years ago

Yeah, it seems like it would be cleaner if they just killed addTextTrack, so you just knew to use a track element for everything. Is there something I'm missing about what addTextTrack brings, besides just side stepping creating an element?

gkatsev commented 9 years ago

It's a js-only API. I'm not sure what exactly is the reasoning about having it and making these tracks being non-removable.

dsparacio commented 9 years ago

when using addTextTrack and TextTrack object is used and label and lang are read only. I think this is what the addTextTrack api brings. Is this the same element that you are saying I should manual create ?

readonly attribute DOMString label; readonly attribute DOMString language;

gkatsev commented 9 years ago

These properties are always read-only on the TextTrack object. However, if this object is associated with a TextTrack HTML Element, these values could be changed through that element. I really do think the main reason for it is to have a purely programmatic interface.

Also, seems like removeTextTrack did exist at some point: https://codereview.chromium.org/92763002 Looking into figuring out why it was removed.

gkatsev commented 9 years ago

I found the best explanation as to why it doesn't exist right now. https://www.w3.org/Bugs/Public/show_bug.cgi?id=22601

If we made in-band text tracks readonly, we could probably get away with cleaning them up when they fall into the time before the earliest possible position and are known to be over and done with, but this isn't something that's come up yet.

Basically, they didn't have a huge usecase for it and the native in-band metadata tracks are read-only/non-removable and addTextTrack behaves a bit similarly to the in-band metadata tracks.

heff commented 9 years ago

I think some additional clarification is needed. When we say use the track element, the process would look like:

var trackEl = document.createElement('track');
trackEl.label = 'English';
trackEl.srclang = 'en';
trackEl.kind = 'captions';
trackEl.src = '...'; // optional

videoEl.appendChild(track);
var track = trackEl.track;
track.addCue(...);
// etc.

// when unloading the source
videoEl.removeChild(trackEl);

Basically, they didn't have a huge usecase for it and the native in-band metadata tracks are read-only/non-removable and addTextTrack behaves a bit similarly to the in-band metadata tracks.

So that makes it too limited now to be useful for all kinds of tracks. Really anything outside of metadata tracks if you're planning on ever changing the source.

gkatsev commented 9 years ago

Thanks, @heff, it is good to clarify it. The fact that they're both called TextTracks makes it confusing.

Also, elaborating my last statement, they're saying it's an issue because if you removeTextTrack a track you didn't create, it can cause problems. Particularly in-band metadata tracks that get updated as more of the stream is loaded and aren't necessarily updated all at once like normal remote text tracks will do.

gkatsev commented 9 years ago

With the prevalence of MSE and players that have to deal with in-band metadata and fragments text tracks, it might be a good time to bring removeTextTrack back up with the w3c/whatwg. At least now, there's a compelling use-case.

heff commented 9 years ago

it might be a good time to bring removeTextTrack back up with the w3c/whatwg

Yeah, it could be good timing too. It sounds like there's some new discussions happing in the tracks/vtt space (if I heard @silviapfeiffer correctly).

However what is the browser supposed to do if you call removeTextTrack on a track that has an associated track element? Should it modify the HTML for you? For some reason to me it feels like that's breaking a rule.

gkatsev commented 9 years ago

There was some discussion about how the API should work in the bug above. It should possible throw an error if given any track that wasn't created by an addTextTrack. Though, it is weird that it won't work for all types of track objects.

gkatsev commented 9 years ago

To summarize:

Use text track elements and their associated text track objects when creating tracks that should be removed between source changes.
Think about talking to the w3c/whatwg or whoever will listen about adding a removeTextTrack method to mirror addTextTrack. We have the usecase of in-band metadata tracks around MSE players.
Consider refactoring our addTextTrack function to work with a new removeTextTrack function. This would also be useful as a PoC in the preceding bullet.

gkatsev commented 9 years ago

I added the last bullet point as another thing to think about. As I mentioned, it could be using as a proof of concept if we do end up talking to the standards bodies about it.

heff commented 9 years ago

Sounds good. Re bullet 2 - just to be clear, you can still handle in-band metadata tracks with the track elements, i.e. the fragmented text tracks use case mentioned should also use track elements for now.

dsparacio commented 9 years ago

@gkatsev @heff - I have removed the use of addTextTrack API and manually adding track elements. I am seeing problems with the latter. Video.js code is running vjs.TextTrackDisplay.prototype.updateForTrack = function(track) when I do not use the API. If I block this method tracks render at startup and I can switch tracks ONCE and only once. When I try to switch back to a track it will not render on screen. However, If I allow this method to execute then the first track will render but I can not switch tracks at all. If I do nothing renders again....

With all the code changes I have made, I can easily switch back the addTextTrack API and everything works great.

I can also use the appendChild method with fragmented text since there are no cues at startup - they are appended in to active cue during runtime.

This leads me to believe there is something in video.js that happens when I manually append tracks that does not happen when I use the API. I am debugging video.dev.js for 4.12.... Would love to see if you guys can help with this.

It is impeding me from committing the final code to dash.js git. I will move on with just using the addTextTrack API for now until we can figure this out. We will just have the same session issue we have now until I can figure this out.

Thanks for all the help with this so far.

Dan

dsparacio commented 9 years ago

Some more info

in video.dev.js

If I change !track['activeCues'] to !track['activeCues'].length in the following block

vjs.TextTrackDisplay.prototype.updateForTrack = function(track) {
  if (typeof window['WebVTT'] !== 'function' || !track['activeCues'].length) {

And I change track['mode'] = 'disabled'; to track['mode'] = 'hidden'; in

vjs.TextTrackMenuItem.prototype.onClick = function(){

if (track === this.track) {
      track['mode'] = 'showing';
    } else {
      track['mode'] = 'hidden';
  }

Then I can switch tracks with no problems. Again this is only the case when I use appendChild as suggested above VS using the addTextTrack API. The latter no changes in the video.js file are needed and switching tracks work as expected.

PS I am not saying this is a solution because using hidden is not efficient. Trying to understand why setting a track element vs TextTrack to disabled then showing again does not work....

heff commented 9 years ago

@AkamaiDASH this goes a little deep than my understanding of how tracks work now. It does seem odd that addTextTrack would be working but the appendChild method wouldn't. I'd expect appendChild to just be doing addTextTrack behind the scenes.

I think @gkatsev will have to weight in on this.

dsparacio commented 9 years ago

@heff. I expected it to as well ;) One thing I noticed, when I appendChild and inspect the video element, I see the tags for each srcLang I appeneded. When I use the addTextTrack API I do not see these track elements.

@gkatsev - Would you have a quick second to chat about this sometime today? I am certain I am adding the tracks and cues properly.

gkatsev commented 9 years ago

@AkamaiDASH do you have an code example? Also, addRemoteTextTrack returns a either a track element or an object with a track property which points at a TextTrack object. Are you trying to add in metadata tracks or regular captions tracks?

dsparacio commented 9 years ago

I am adding caption || subtitles as KIND

mounted a version of our player using the modified video.dev.js 4.12 to get it to work. http://mediapm.edgesuite.net/dash/private/support-player/nightly/v1.5/index.html

See Multitrack urls in pull down.

Code in Dash.js:

var track = document.createElement('track'),//this.getCurrentTextTrack(), captionType = textTrackQueue[i].role ? textTrackQueue[i].role : "captions";

currentTrackIdx = i; trackElementArr.push(track);

track.kind = captionType === "subtitle" ? "subtitles" : captionType; track.label = textTrackQueue[i].lang; track.srclang = textTrackQueue[i].lang; track.default = textTrackQueue[i].defaultTrack;

this.video.appendChild(track); this.addCaptions(0, textTrackQueue[i].captionData);

This is part of a huge PR https://github.com/Dash-Industry-Forum/dash.js/pull/691/files#diff-2a1697e88e3999571bbf4ffce32551b5 FILE src/streaming/extensions/TextTrackExtensions.js

dsparacio commented 9 years ago

Hi @gkatsev - So as I mentioned above I was going to make a sample for the new feature in dash.js that does not use video.js. I have found that when I set track to showing or disabled my sample works as expected. There was no need to use "hidden" mode. So there is some other code in video.js that is causing the issue other then what is pointed out above.... This is good news, just need to find out what is interfering.

dsparacio commented 9 years ago

Hi @gkatsev - Disregard the last comment....I spoke too soon. Using disabled as a track mode and setting back to enabled causes issues outside of video.js as well. Especially for tracks that already have cues added from external xml file.

dsparacio commented 9 years ago

@heff @gkatsev Sorry for the above spam guys! So i believe I understand what is going on now.

First, to reiterate, I am dealing with two types of timed text tracks. External xml that is loaded parsed and all cues are added at startup. Second, fragmented text that continuously loads, is parsed and adds the cue(s) to the track per fragment load. Both ways use the same logic to create a track element and append to video. there will be as many tracks created as there are text adaptations in MPD.

I noticed that when I set a track mode to disabled and then showing again:

With fragmented text it starting showing cues as soon as new cues where downloaded and added. All the cues that were already downloaded when the track was set to disabled are not shown when mode changes back to showing. It takes new cues being added to get the track to show.
With non-fragmented text, where all cues are loaded at startup, will never render cues again once marked disabled and then back to showing.

Also , This is only the case when using the appendChild with track element vs the text track addTextTrack API. I believe this is because the first uses a more primitive object Track vs the latter uses TextTrack. Not sure the difference with objects yet but I believe it is to due with how cues are pushed into activeCues and managed during mode changes.

This is as fun to figure out as figuring out why they removed the removeTextTrack api....

dsparacio commented 9 years ago

@gkatsev - after talking to some folks and doing some research I found this article interesting https://developer.mozilla.org/en-US/Apps/Build/Audio_and_video_delivery/Adding_captions_and_subtitles_to_HTML5_video

They show the use of mode set to hidden or showing and do not discuss the use of disabled, which is the only current issues with video.js and how we implement in dash.js. @heff you had mentioned that when tracks are hidden they still fire but I am not seeing this and not seeing a performance issue when set to hidden. Am I missing something that you have seen before ?

gkatsev commented 9 years ago

If you take a look at the spec, you can see that mode can be set to one of three options:

disabled
hidden
showing

Both disabled and hidden do not show any of the cues on the screen, however, hidden is supposed to keep firing cuechange events when cues become active or inactive. Since you're using the native implementation, there probably aren't any performance issues with keeping the tracks hidden unless maybe you have tens or hundreds of subtitles. But it seems like disabled is the correct choice if you are not using a specific track.

dsparacio commented 9 years ago

"But it seems like disabled is the correct choice if you are not using a specific track." Not if we are going to avoid using addTextTrack API so we can cleanup the track elements. Since mode "disabled" will eliminate all active cues with the more primitive track element vs the TextTrack element which I can not seem to add manually .

Also setting the track element to default has no affect as it does when using TextTrack Elements via the addTextTrack API. Another issue that needs to be solved just so we can clean up properly . I really need the removeTextTrack api.....

@gkatsev Is there a plan to address this or are you guys opting/thinking to leave video.js the way it is now regardless of these findings?

dsparacio commented 9 years ago

@heff - Another issue I have is then the MPD has a mixture of subtitles and captions. Currently Video.js will put each type in its own icon and menu which is not the case for most other players like JW. This specifically calls for special handling just for video.js in dash.js. Before I implement this, are you aware of this and any chance to change this behavior. I suppose you technically could have CC and subtitles at the same time but just not sure if that is a real use case.

heff commented 9 years ago

@AkamaiDASH treating them as the same seems wrong to me. Subtitles are for translations and captions are for the hard of hearing and will include notes about other audio that's happening besides speech, or at least that's how I understand it. (@OwenEdwards can probably weigh in on that, or the best way to handle the two). It does look like the new YouTube player just treats them all the same too.

Does DASH handle anything around descriptions? That's third track type that has it's own menu handling as well.

dsparacio commented 9 years ago

@heff agreed there certainly should be a clarity in the UI or each type and how this is done will vary between players I am finding out. I am not sure this matters as long as the video's texttracklist is maintained as one list and not altered, besides mode, after i add tracks.

In testing, If I override what is in the mpd and label all as one kind (subtitles or captions) things work fine. If I do not override my test samples kind attribute I will get two subtitles and one caption. I would expect this to work fine as well since all I am doing is checking which track's mode is showing to determine the change. Will look at this next week again sure it is simple just want to make sure I come up with a solution that works for all of us.

Only captions and subtitles have been tested and focused on. I think descriptions then chapters and meta will follow.

OwenEdwards commented 9 years ago

Let's clarify that each track has a kind, a srclang and a label; it's the label that is displayed to the user in the menu, not the srclang.

Then @heff, your understanding is correct, and W3C made that distinction between subtitles and captions, but it's confusing because:

That distinction isn't used (at least with any clarity) outside the US and Canada
There's also the term subtitles for the deaf and hard-of-hearing (SDH), which is essentially captions. According to a 3PlayMedia's article on the difference, SDH is captions content (including sound effects and speaker identification), carried and rendered like subtitles. This is what HTML5 and the track element achieve anyway, so subtitles and SDH would be an equally valid distinction.

If you look at the W3C User Agent Accessibility Guidelines 2.0, the spec essentially dismisses the distinction. ~~This example from W3C seems to confuse the idea of there being a distinction even further, since it uses the same track as captions and subtitles with different label!~~ (EDIT: see my follow-up comment)

Without better clarification, it seems like video.js could have an option to treat subtitle and caption tracks identically, with the same [CC] button, menu and styling options for both, and if two tracks have the same label but different kind, there would need to be a way to distinguish them in the menu. Having the same track as both captions and subtitles but with different labels is confusing, but no more so if it's in two different menus than in one.

As far as allowing a user to display captions and subtitles at the same time, I don't see other players implementing that, and I'm not clear that video.js supports it fully with vtt.js anyway. That's a question for @gkatsev.

silviapfeiffer commented 9 years ago

That W3C example is indeed confusing. They should be two different files. Captions are always supposed to be a superset of subtitles, so displaying both at the same time makes no sense. Just display captions.

OwenEdwards commented 9 years ago

I see that the W3C example is correct in the HTML5 Recommendation, but that the Wiki example is incorrect/out-of-date. [EDIT: I fixed the W3C Wiki!!]

The question about showing captions and subtitles at the same time is not for the case where the captions are just a superset of the subtitles (as in the corrected example), but for the case where a user might want to select, say, French subtitles (because French captions are not available), and then also have English captions/SDH, to get the information that isn't in the subtitles. This use case is purely hypothetical, and doesn't actually work (because you really want (French subtitles) + (English captions) - (English subtitles), and that's just silly!)

For video.js, the more I think about it, it would make sense to roll the subtitles and captions together into one control menu, with some ancillary indication (icon, text) of which is subtitle and which is captions. (That indication is tricky, because of the different meanings conveyed by those terms in different parts of the world, as noted earlier). Having two controls gives the impression that both can be enabled at once.

silviapfeiffer commented 9 years ago

How about put [CC] as an indicator behind the caption track?

OwenEdwards commented 9 years ago

Wouldn't putting [CC] next to the caption tracks be confusing if the button to open the menu of subtitles and captions tracks is also marked [CC]? I can imagine people asking "well, what are the other ones then?" Maybe it would be better to mark the tracks which aren't "CC"?

We're also getting into a language issue at this point, since some English-speaking countries use 'captions' vs 'subtitles', and some use 'subtitles' vs 'SDH'. In the UK, BBC's iPlayer has an 'S' (subtitles) button, not a 'CC' button. So even these terms and icons need i10n/l18n support. (Compare to amazon.com's "cart" vs amazon.co.uk's "basket").

silviapfeiffer commented 9 years ago

Yeah, it's not simple.

heff commented 9 years ago

At least from an implementation standpoint, it's easiest to consider them separate concepts. One button for each track kind, as opposed to special-casing specific tracks.

Could we push this decision to the publisher? If the publisher wants them to be listed together they can they just say they're either all captions or all subtitles, but otherwise they can use the kinds to distinguish them.

I feel like combining the two could get more complicated as we try to improve the UI experience for each use case. For example we might be more aggressive about enabling captions by default when we think they're needed, than we would be with subtitles. But maybe not.

gkatsev commented 8 years ago

I think it's time we bring combining all the menus into one "gear" menu back up. With the multiple- audio and video track support in contrib-hls (and also in dash) and even natively, along with the new descriptions track support from @OwenEdwards, we're now going to have a lot of menus on the control bar. We can't just continue adding more menus. If you have captions, descriptions, subtitles, chapters, variable playback speed, rendition selector, and multiple audiotracks (not currently in videojs but the menu for it should definitely be), you now have 7 buttons on the control bar not including the normal default ones. On a small screen, you'll be lucky if you see half of these. And this is just native functionality, plugins could be adding more things. Also, we could do two menus instead of just one, like a tracks menu and an everything else menu. That would be a lot better than having 7 different menus. Another thing to consider is doing something like what youtube does for their caption support. There's a captions button on the control bar that toggles the default caption and the actual selection of the caption is done via the gear menu.

gkatsev commented 8 years ago

Similar to #2603.

nickygerritsen commented 8 years ago

I think a gear menu would make sense. We should still make it configurable whether you want it or not, but if you make it a component this could be not too hard.

gkatsev commented 8 years ago

Yeah, it would have to go in as a new component that ties a bunch of stuff up together. In a 6.0 release, we could think about changing it to be a default.

videojs / video.js

Handle switching between sets of text tracks #2369