Behavior with controls, particularly non-native controls, overlap

What should be the expected behavior of cues when controls or obscure the cues?

According to the spec, cue rendering should be re-done when the native controls are shown (steps 4 and 5 of the Processing Model). There are two corresponding tests on WPT https://github.com/web-platform-tests/wpt/blob/master/webvtt/rendering/cues-with-video/processing-model/enable_controls_reposition.html https://github.com/web-platform-tests/wpt/blob/master/webvtt/rendering/cues-with-video/processing-model/disable_controls_reposition.html

This leads to two questions:

The browsers have different behavior around this. Should the spec define a consistent behavior?
A lot of video players nowadays do not use native controls, but try to rely on native caption support. How can we better support this use case?

For 1, Here are the current behaviors in Chrome, Firefox, and Safari:

Chrome will move a cue up to not overlap with the control bar and then leave the cue there for the remaining duration of the cue.
Firefox will move the cue up and leave it there unless another cue is rendered, in which case the moved up cue will move back down.
Safari moves the cues up and then immediately back down when the controls collapse.

In addition, it can lead to interesting bugs, where due to Chrome's taller controls section, cues may end up getting re-ordered which can confuse the viewer since the cues are reversed from the expected order. https://bugs.chromium.org/p/chromium/issues/detail?id=1141592#c_ts1630084590

For 2, Since WebVTT limits which CSS properties are allowed, it isn't possible to target ::cue with something like bottom: 3em or a transitionY(-3em). However, Chrome and Safari provide a pseudo-element for the text track display area that could have a transform applied to it when the controls are active. For example:

video::-webkit-media-text-track-display {
  transform: transitionY(-3em);
}

Firefox does not have such a mechanism. Though, even this isn't ideal because if there are cues at the top of the video it could push them out of view.

The only reliable cross browser way of moving cues so that they aren't obstructed by the control bar is to modify their line property. While this works easily for simple cues, it could get complicated for non-simple cues, like those that are positioned or are inside of regions.

Otherwise, you either accept that captions will be obstructed by the controls, or no longer use native rendering of captions. Both options aren't ideal.

+1 this is a real world issue, for captions in any format. Also worth noting is that controls nowadays are not universally positioned at the bottom of the screen, but are often large and in the vertical centre of the video area.

There's also a related concern where there are players that cover the entire display area in a semi-transparent black color to increase contrast for the controls etc., which reduces the text contrast and makes it harder to read.

Yes, spatially, this is a 3 dimensional problem. x and y are for horizontal and vertical positioning to avoid overlap, and z is for layering. Who defines the layer model of the controls vs the captions? Also other UX that players might want to add, that go beyond controls and into more complex product features, like onward journeys, watch lists, cast information etc.

The Timed Text Working Group just discussed Behavior with controls, particularly non-native controls, overlap w3c/webvtt#503, and agreed to the following:

SUMMARY: Issue discussed and recognised, applies to all caption formats.

The full IRC log of that discussion

<nigel> Topic: Behavior with controls, particularly non-native controls, overlap w3c/webvtt#503
<nigel> Gary: I don't think we will completely cover this today but I think that's fine. It's a big topic.
<nigel> .. Background: the question arises from when there are captions at the bottom of the display area. What happens
<nigel> .. when the user interacts with the video player and the controls are shown.
<nigel> .. The controls can obscure the captions, which can be problematic from an accessibility standpoint,
<nigel> .. for those that depend on the captions.
<nigel> github: https://github.com/w3c/webvtt/issues/503
<nigel> Gary: WebVTT right now with native controls has a mechanism to say that the captions should rerender to account for the native
<nigel> .. control bar.
<nigel> .. But then how do you handle this with a non-native control bar?
<nigel> .. Also the behaviour potentially has bugs because it can cause cues to reorder,
<nigel> .. which could be confusing to the user.
<nigel> Nigel: The bug part needs to be fixed, because displaying lines out of order can't be right.
<nigel> Gary: I think it is to spec as written now.
<nigel> .. It's an issue if you have 2 cues, one for each line, instead of a 2 line cue.
<nigel> .. If only the second line gets obscured but the first can be positioned normally, then the second one gets moved and ends up above the first one.
<nigel> Nigel: That's 2 cues rather than 1 cue with a line break in it?
<atai> q+
<nigel> Gary: it's 2 cues with each line in a separate cue rather than 1 cue with a break in it.
<nigel> ack at
<nigel> Andreas: I second that this is an important issue.
<nigel> .. I encountered it with subtitles for audio only, and in some browsers the control bar never disappears.
<cyril> q+
<nigel> .. Then the WebVTT cues can be permanently obscured by the control bar.
<nigel> .. I did not investigate if that is spec conformant.
<nigel> Gary: With an audio element?
<nigel> Andreas: With a video element pointing to audio content.
<nigel> Gary: Interesting that the controls are always visible.
<nigel> Andreas: The question on the solution part is if it is for the HTML spec or for the WebVTT spec?
<nigel> Gary: I'd argue for both because there's the reordering behaviour and also can you represent native controls
<nigel> .. so that the captions don't overlap - that may be for the HTML spec.
<nigel> s/native/non-native
<nigel> ack c
<nigel> Cyril: I don't know if this is true for all players, but some of the Netflix players reduce the size of the viewport when controls appear.
<nigel> q+
<nigel> Gary: You shrink the text area?
<nigel> Cyril: Yes, it temporarily squishes until the controls disappear.
<nigel> .. This makes the text move.
<nigel> ack n
<nigel> Nigel: Some BBC players do the same thing as what Cyril said, but...
<nigel> ... our newer UX design puts the controls in the vertical centre, so that doesn't work any more!
<nigel> .. Some time ago I suggested an API for saying where not to put captions.
<nigel> .. This is a real problem - it's not just controls, it can be other overlays too.
<atsushi> +1 on issue ;)
<nigel> SUMMARY: Issue discussed and recognised, applies to all caption formats.

Thinking about the chrome bug, I'm not sure how this behavior can be clarified in the spec to improve user experience without rewriting the collision avoidance from scratch, which is probably not an option.

The Timed Text Working Group just discussed Behavior with controls, particularly non-native controls, overlap w3c/webvtt#503, and agreed to the following:

SUMMARY: Look into the reverse rendering of cues at the same time for collision avoidance; Continue to think about handling non-native controls and overlays.

The full IRC log of that discussion

<nigel> Topic: Behavior with controls, particularly non-native controls, overlap w3c/webvtt#503
<nigel> github: https://github.com/w3c/webvtt/issues/503
<nigel> Nigel: Gary, you added a comment about rewriting the collision avoidance from scratch.
<nigel> .. I'm not familiar enough to understand the impact.
<nigel> Gary: Yes. This is about the potential for making it appear that two active cues are out of order.
<nigel> .. It's very likely not a good user experience.
<nigel> .. If the lines show up backwards it could be confusing.
<nigel> .. That comes from collision avoidance which renders a cue at a time, and
<nigel> .. when rendered, the cue should not be moved again.
<nigel> .. The first cue gets rendered, then the second cue can't render in the expected location
<nigel> .. because of controls and the first cue, so it moves up.
<nigel> Nigel: Do you have an alternative algorithm in mind?
<nigel> Gary: I don't, but conceivably you could do something smarter like not rendering a cue at a time.
<nigel> .. Might not be backwards compatible.
<nigel> .. Changing such a big thing, what's the likelihood of browsers picking up the change?
<nigel> Pierre: What do browsers do today?
<nigel> Gary: The issue is that they do things slightly differently.
<nigel> .. In terms of collision avoidance?
<nigel> Pierre: What I overheard is that the collision avoidance algorithm makes other issues more complicated to solve.
<nigel> .. If it was not there would it make things easier or harder, or is it orthogonal?
<nigel> Gary: I think it is orthogonal.
<nigel> .. Without the collision avoidance it would not be an issue, but then you'd have the issue of the controls overlaying the captions.
<nigel> Pierre: Thank you.
<nigel> .. And it is not possible to pick one implementation and capture what it does?
<nigel> Gary: They do it differently.
<nigel> Pierre: Could you pick one?
<nigel> Gary: The control size in different browsers is different so the thresholds differ.
<nigel> Pierre: Got it
<nigel> Gary: I mean to create an example to see how Safari and Firefox handle it.
<nigel> q+ to talk about writing order
<nigel> .. There are bugs across browsers where they handle it a bit differently in some cases.
<nigel> ack n
<Zakim> nigel, you wanted to talk about writing order
<nigel> Nigel: We've never found a bottom-to-top writing mode script so far, so we should
<nigel> .. probably be trying to position the lowest cue first, then the next further up, etc.
<nigel> .. So just reverse the order?
<nigel> Gary: I think that some implementations like hls.js already do something like that.
<nigel> Nigel: So there's already an implementation of this?
<nigel> Gary: It sounds like a good way to avoid a complete rewrite.
<nigel> .. They render the 2nd cue first, then the 1st cue, so that the 1st cue ends up getting pushed on top.
<nigel> Nigel: Yes, makes sense.
<nigel> Gary: But now the issue becomes: if we change the spec then those workarounds might render incorrectly.
<nigel> Nigel: Would they?
<nigel> Gary: Yes because they'd be double-reversed.
<nigel> .. If the browser hands the cues to the renderer in reverse order and hls.js reverses it then the outcome will be wrong.
<nigel> Nigel: Well something has to change!
<nigel> Gary: It might not be a blocker, but worth considering.
<nigel> Nigel: Fair enough.
<nigel> Gary: It sounds like for this part of it investigating changing it to do reverse rendering makes sense.
<nigel> .. But then there's the other part which is how to handle the overlap with non-native controls.
<nigel> Nigel: Yes, that's quite a bit harder.
<nigel> Gary: It would probably end up meaning changes to HTML as well, potentially.
<nigel> Nigel: One of my questions is how people position non-native controls.
<nigel> .. HTML doesn't allow any children of video elements so tracking layout changes is awkward
<nigel> Gary: One thought is an API on the video element to indicate where it's rendering to, so the calling code
<nigel> .. could say where to avoid drawing captions.
<nigel> .. I was talking to a friend who suggested the captions layer should always be on top of everything.
<nigel> Pierre: It'd be backwards to have the application ask where the controls are.
<nigel> .. The browser is going to put controls there, so it should make sure things are out of the way.
<nigel> Gary: Yes. In the WebVTT spec it says when the controls show create a CSS box for where the controls are
<nigel> .. and consider that box to know whether the cue render location is valid or not.
<nigel> .. The simple solution is to be able to add more boxes to consider, from outside.
<nigel> Nigel: That would make sense.
<nigel> .. It doesn't solve the wider problem of how you position anything relative to the video element.
<nigel> Pierre: Right, you might want to position other stuff, that could also be impacted by the controls.
<nigel> .. That's a bigger issue.
<nigel> Gary: Yes, if you're laying stuff out yourself then you're on the hook for avoiding overlap.
<nigel> .. But if you're relying on native caption rendering with non-native controls...
<nigel> Pierre: Right but in my simple model you'd say captions render over the related video element,
<nigel> .. the entire thing. If the browser wants to show controls over the video then it needs to scale that region
<nigel> .. to make it so that there's no overlap, or use transparency, or move controls somewhere else.
<nigel> .. In my mind the timed text rendering algorithm should not try to guess what weird shapes the controls take.
<atai> q+
<nigel> Gary: Yes. I'm not suggesting it should guess automatically.
<nigel> .. This is about native caption rendering and custom controls.
<nigel> Pierre: From a caption standpoint we should keep it simple.
<nigel> .. Someone authors captions with the expectation that they fill the related video.
<nigel> .. If the browser or the application wants to take over part of the related video element it's their responsibility
<nigel> .. to rescale captions, move them off screen etc.
<nigel> .. Trying to anticipate that in the caption and subtitle specs is the wrong way around.
<nigel> .. They're not authored that way.
<nigel> .. Position is important.
<nigel> Andreas: For control bars it is also very important to be accessible.
<nigel> .. If you say that captions always render on top, which happened in an implementation we did,
<nigel> .. you may end up being unable to interact with the control bar.
<nigel> ack at
<nigel> Gary: That is why I'm not sure it is necessarily the best direction.
<nigel> .. To Pierre's point, I'm not proposing changing something in WebVTT necessarily.
<nigel> .. I think the question is right now, with native controls and native captioning,
<nigel> .. it handles it so that captions don't overlap the control bar.
<nigel> .. At least in the standard captions if you're not positioning them especially.
<nigel> Pierre: By default in WebVTT if you do just times and text like SRT the browser will be helpful
<nigel> .. and will use the absence of explicit positioning to set its own position to avoid the clash.
<nigel> Gary: Right. The question is if we can help people using native captioning but non-native controls to do the same thing.
<nigel> Pierre: Maybe the reason I'm having difficulty is it is not clear if there is good semantics in the DOM and HTML and CSS to
<nigel> .. position things on top of video in a reliable way?
<nigel> Nigel: My conclusion is there is not - I tried to do this and it is super difficuly.
<nigel> s/ly/lt
<nigel> Pierre: If there were a way then you could scale the caption element to avoid overlapping the control element.
<atai> q+
<nigel> Gary: Chrome and Safari provide a pseudo element for the text track display.
<nigel> .. That could be a solution, to codify that as a thing browsers should expose, so that applications
<nigel> .. could apply styles to it.
<nigel> Pierre: Then you could start exposing ancillary content, like TTML if not supported, or other things, reliably on top of video elements.
<nigel> ack at
<nigel> Andreas: There is the bigger question that could be resolved, but the control bar
<nigel> .. one is the most prominent because customising control bars is very common.
<nigel> .. I see an issue that the HTML spec puts controls out of scope, but I think at least it needs to be a first class
<nigel> .. citizen, because it will be there.
<nigel> Pierre: I like the idea of trying to standardise the way browsers allow explicit control over what is overlaid on the video,
<nigel> .. maybe just starting with the control bar.
<nigel> .. Then the custom controls can do style changes to resolve the overlap problem.
<nigel> Gary: Right, that's what we do in video.js, we squish the text track display area when controls are present.
<nigel> Pierre: I can see other use cases, like insertion of ads during credits, or other things,
<nigel> .. that need to be on the video area without killing the captions.
<nigel> Gary: Yes. We're out of time for today.
<nigel> SUMMARY: Look into the reverse rendering of cues at the same time for collision avoidance; Continue to think about handling non-native controls and overlays.

Confirmed that all browsers will push the second line above the first line if the cues are positioned low enough so that the control bar will obscure the captions. You can see an example on https://simple-video.netlify.app/ if you select the "single cue per line" caption track.

Expected ordering: oceans trailer video with first caption showing expected two line ordering Unexpected ordering:

Mentioned in AOB of Media WG at https://www.w3.org/2022/04/12-mediawg-minutes.html#t08 to ask if general video element handling (for non-native controls etc) is in their remit. Could be that a joint call with them and the OpenUI CG would be helpful.

I thought we had specified that the rendering area for captions changes when controls are shown, so the Safari behavior seems most conformant. All calculations are still the same except with a smaller rendering area, the line heights are reduced.

I did not check the spec, but that's what I remember.

Hope that's helpful.

On Tue, Apr 12, 2022, 11:12 PM Nigel Megitt @.***> wrote:

Mentioned in AOB of Media WG at https://www.w3.org/2022/04/12-mediawg-minutes.html#t08 to ask if general video element handling (for non-native controls etc) is in their remit. Could be that a joint call with them and the OpenUI CG would be helpful.

— Reply to this email directly, view it on GitHub https://github.com/w3c/webvtt/issues/503#issuecomment-1096857236, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAXFQPU3IZMPFYY4MMYVY3VEWHE7ANCNFSM5QNH6A7Q . You are receiving this because you are subscribed to this thread.Message ID: @.***>

I thought we had specified that the rendering area for captions changes when controls are shown

I haven't checked either, but if we did specify that, then it wouldn't work with the current fashion to position controls in the vertical centre. It may be time to revisit this and come up with a more general solution.

The spec says that when controls are shown the areas that controls are covering are made into transparent css boxes so that when the cue css boxes are created the cue boxes could be placed where the controls won't be. Theoretically, it should help with vertical centered controls. Though, this will only help with native captions and native controls, and still leaves us open to the bug where cues are re-ordered.

While the spec also says that whenever any of the cue settings change that the processing model algorithm for rendering cues should be re-run, it doesn't necessarily say that it must be run whenever the user agent has started or stopped showing their user interface.

At first, I thought that both should be discussed in one issue for ease of use, but I think it may be best to split this issue into two for greater clarity and more focused discussion. Thoughts?

how to handle native captions with non-native controls and other overlapping content
how to improve user experience when cues are repositioned as part of controls showing

I spent some time thinking about this, I wonder if we can adopt a new CSS property something like controls-mask and the values can be something similar to clip-path. Though, instead of just a single value, it could be a list for multiple shapes. Originally, I was thinking that it could be a xywh value like the spacial media fragments but I think the clip-path direction is more flexible.

The Timed Text Working Group just discussed Behaviour with controls (with Media WG), and agreed to the following:

SUMMARY: Bring this topic to the attention of the relevant stakeholders to gauge interest

The full IRC log of that discussion

<nigel> Subtopic: Behaviour with controls (with Media WG)
<gkatsev> github: https://github.com/w3c/webvtt/issues/503
<cpn> Gary: There's a Chrome issue around how collision detection should work
<cpn> ... What I want to talk about is, if you use native text tracks and making sure captions aren't obscured
<cpn> ... The spec says that when the native controls show, auto-positioned cues should move out of the way of the native controls
<cpn> ... Is it possible to extend that to keep captions out of the way of the area where controls are shown
<cpn> ... In a previous Media WG, Eric mentioned if doing something in CSS would be useful
<cpn> ... I had the idea to have a clip path property on the video element to define regions
<cpn> ... That could potentially allow you to specify a number of regions, and have the user agent try to render captions outside those positions
<cpn> Eric: Does it need to be that complex? The more complex it is, the harder to implement, and possible to not get quite right
<cpn> ... I was thinking of something like a safe area insert
<cpn> ... Haven't thought through the details though
<cpn> Gary: Something like that might be fine. It would mean you only have control of the top or bottom of the media element, not for centered controls
<cpn> Nigel: Does it matter if you specify a positive area where it's OK to render captions, or a negative area where it's not OK?
<cpn> Eric: I don't think so
<cpn> ... Presumably one is just the inverse of the other
<cpn> Nigel: I thought if might affect computational complexity
<cpn> Eric: I don't think so
<cpn> Gary: Converting from one to the other shouldn't be hard
<cpn> Nigel: For a real world use case, you'd need more complexity than a single rectangle.
<cpn> ... Thinking of a typical player with scrub bar and settings button at the top somewhere. Players have some visual complexity in the UI, which constrains where you'd put other content in a less than simple way
<cpn> ... Is it worth thinking about how the data flows? It's a two-way problem, there are two independent things, either could be rendered natively or not natively: captions and controls
<cpn> ... Come up with a design that handles all permutations?
<cpn> ... Will be controversial if we try to say where player controls go. Is there an elegant way to do this?
<cpn> Gary: Don't think it's a big issue for custom rendered captions and video control bar
<cpn> Nigel: Really?
<cpn> Eric: Can be done with the web inspector, but not at runtime
<cpn> Nigel: So you'd have to design for each browser based on inspection and how it behaves. That's not very friendly
<cpn> Gary: This can be done, though, whereas the opposite is a lot more complicated
<cpn> ... I've done it, my current method is changing the line property of auto-positioned cues, and heuristics
<cyril_> q+
<atai> q+
<cpn> ... It works across browsers, and there are bugs. Needing to do that is a pain
<nigel> ack n
<cpn> Cyril: Why is it a problem, why do we care about position of the captions while the controls are on. That's where the users' focus is
<cpn> Eric: Controls are shown for some amount of time, mouse moves. That interval is where there's a problem
<cpn> Cyril: Couldn't browsers reduce the viewport size?
<cpn> Nigel: That's discussed in the issue. Native controls can do that, but for custom controls it's common to have a non-rectangular area
<cpn> Gary: For live, the progress bar is gone, so theoretically captions don't need to move even with controls shown. There are complex permutations
<cpn> Eric: Two issues: Native controls with custom cues, and native cues with custom controls. Neither knows about the other
<cpn> ... We want one solution that works for both
<cpn> ... I don't think it's hard to achieve. Custom controls are scripted, so there's a way for it. Cues and custom cues are just logic implemented in different places
<cpn> ... Each just needs to pay attention to the information provided by the other
<cpn> Gary: So the video element would need to provide information on the positioning of native controls?
<cpn> Eric: Sure
<cpn> Gary: Is there a privacy issue?
<cpn> Eric: Don't think it will be. You can use the inspector to check the position for a particular browser version
<cpn> Nigel: Are there no accssibility settings that affect size and position of native controls?
<cpn> Eric: I don't think it's an issue
<cpn> Gary: I think in Safari the width may vary if there's a chevron
<cpn> Eric: All of that is implemented by the JS that implements the controls, it doesn't have access to anything the DOM doesn't
<nigel> q?
<cpn> q?
<cpn> ack cy
<nigel> ack cyri
<nigel> ack at
<cpn> Andreas: Which group would develop solutions? Is it something for HTML or Media WG?
<cpn> q+
<nigel> Chris: This tends to be close to the video element, which is a WHATWG thing. We tend not to specify things,
<nigel> .. but MediaWG could begin the spec work and then hands it over.
<nigel> Eric: Agree, we have the right people, then where it ends up is secondary.
<nigel> Cyril: Is there no interest from Google or Mozilla or Edge, given that only Apple is present and nobody from anywhere else has commented.
<nigel> Eric: Could be a matter of physical presence here at TPAC.
<nigel> Alastor_Wu: I'm from Firefox, could you repeat?
<cpn> s/specify things/specify things that closely extend HTMLMediaElement/
<nigel> Cyril: Since no browser vendor has commented, are they not interested?
<nigel> Alastor_Wu: First time I've seen this, I will look into it and comment later.
<nigel> Gary: No comment is probably not knowing about it rather than not caring.
<nigel> Cyril: So first step is to socialise this.
<nigel> Eric: Yes, I think that's right.
<nigel> Chris: Reflects what's happened today, with Alastor and Eric here but not a strong presence from others.
<nigel> Gary: Should we tag specific individuals on the issue?
<nigel> Eric: Yes, good idea, tag me and Jer from Apple.
<nigel> .. I don't know the right person from Chrome.
<nigel> Chris: Chris Cunningham suggested Evan Yu to me.
<nigel> .. I can follow up with Dale Curtis, I hope he can point us in the right direction for Chrome.
<cpn> Nigel: Anything else on this topic?
<cpn> SUMMARY: Bring this topic to the attention of the relevant stakeholders to gauge interest

Could you explain in simpler words what the exact problem is that you're trying to solve? I read it all, but it seems to be that The speed already addresses the mentioned problem, so I must not really understand the problem.

Thanks, Silvia.

On Sat, Sep 17, 2022, 5:40 AM CSS Meeting Bot @.***> wrote:

The Timed Text Working Group just discussed Behaviour with controls (with Media WG), and agreed to the following:

SUMMARY: Bring this topic to the attention of the relevant stakeholders to gauge interest

The full IRC log of that discussion Subtopic: Behaviour with controls (with Media WG)
github: https://github.com//issues/503 Gary: There's a Chrome issue around how collision detection should work ... What I want to talk about is, if you use native text tracks and making sure captions aren't obscured ... The spec says that when the native controls show, auto-positioned cues should move out of the way of the native controls ... Is it possible to extend that to keep captions out of the way of the area where controls are shown ... In a previous Media WG, Eric mentioned if doing something in CSS would be useful ... I had the idea to have a clip path property on the video element to define regions ... That could potentially allow you to specify a number of regions, and have the user agent try to render captions outside those positions Eric: Does it need to be that complex? The more complex it is, the harder to implement, and possible to not get quite right ... I was thinking of something like a safe area insert ... Haven't thought through the details though Gary: Something like that might be fine. It would mean you only have control of the top or bottom of the media element, not for centered controls Nigel: Does it matter if you specify a positive area where it's OK to render captions, or a negative area where it's not OK? Eric: I don't think so ... Presumably one is just the inverse of the other Nigel: I thought if might affect computational complexity Eric: I don't think so Gary: Converting from one to the other shouldn't be hard Nigel: For a real world use case, you'd need more complexity than a single rectangle. ... Thinking of a typical player with scrub bar and settings button at the top somewhere. Players have some visual complexity in the UI, which constrains where you'd put other content in a less than simple way ... Is it worth thinking about how the data flows? It's a two-way problem, there are two independent things, either could be rendered natively or not natively: captions and controls ... Come up with a design that handles all permutations? ... Will be controversial if we try to say where player controls go. Is there an elegant way to do this? Gary: Don't think it's a big issue for custom rendered captions and video control bar Nigel: Really? Eric: Can be done with the web inspector, but not at runtime Nigel: So you'd have to design for each browser based on inspection and how it behaves. That's not very friendly Gary: This can be done, though, whereas the opposite is a lot more complicated ... I've done it, my current method is changing the line property of auto-positioned cues, and heuristics q+ q+ ... It works across browsers, and there are bugs. Needing to do that is a pain ack n Cyril: Why is it a problem, why do we care about position of the captions while the controls are on. That's where the users' focus is Eric: Controls are shown for some amount of time, mouse moves. That interval is where there's a problem Cyril: Couldn't browsers reduce the viewport size? Nigel: That's discussed in the issue. Native controls can do that, but for custom controls it's common to have a non-rectangular area Gary: For live, the progress bar is gone, so theoretically captions don't need to move even with controls shown. There are complex permutations Eric: Two issues: Native controls with custom cues, and native cues with custom controls. Neither knows about the other ... We want one solution that works for both ... I don't think it's hard to achieve. Custom controls are scripted, so there's a way for it. Cues and custom cues are just logic implemented in different places ... Each just needs to pay attention to the information provided by the other Gary: So the video element would need to provide information on the positioning of native controls? Eric: Sure Gary: Is there a privacy issue? Eric: Don't think it will be. You can use the inspector to check the position for a particular browser version Nigel: Are there no accssibility settings that affect size and position of native controls? Eric: I don't think it's an issue Gary: I think in Safari the width may vary if there's a chevron Eric: All of that is implemented by the JS that implements the controls, it doesn't have access to anything the DOM doesn't q? q? ack cy ack cyri ack at Andreas: Which group would develop solutions? Is it something for HTML or Media WG? q+ Chris: This tends to be close to the video element, which is a WHATWG thing. We tend not to specify things, .. but MediaWG could begin the spec work and then hands it over. Eric: Agree, we have the right people, then where it ends up is secondary. Cyril: Is there no interest from Google or Mozilla or Edge, given that only Apple is present and nobody from anywhere else has commented. Eric: Could be a matter of physical presence here at TPAC. Alastor_Wu: I'm from Firefox, could you repeat? s/specify things/specify things that closely extend HTMLMediaElement/ Cyril: Since no browser vendor has commented, are they not interested? Alastor_Wu: First time I've seen this, I will look into it and comment later. Gary: No comment is probably not knowing about it rather than not caring. Cyril: So first step is to socialise this. Eric: Yes, I think that's right. Chris: Reflects what's happened today, with Alastor and Eric here but not a strong presence from others. Gary: Should we tag specific individuals on the issue? Eric: Yes, good idea, tag me and Jer from Apple. .. I don't know the right person from Chrome. Chris: Chris Cunningham suggested Evan Yu to me. .. I can follow up with Dale Curtis, I hope he can point us in the right direction for Chrome. Nigel: Anything else on this topic? SUMMARY: Bring this topic to the attention of the relevant stakeholders to gauge interest — Reply to this email directly, view it on GitHub , or unsubscribe . You are receiving this because you commented.Message ID: ***@***.***>

Thanks, Silvia, it's worth recapping here in simpler terms. I think the TPAC conversation helped crystallize the root issue as well as a potential direction.

The core issue I want addressed is being able to make sure that captions are unobscured by overlays. My initial request was for native captions and custom controls, but at TPAC, we realized we probably want the reverse as well: custom captions with native controls.

While the core of this is for player controls, it could be helpful for any sort of overlay. At TPAC, one idea we had is that the video element should expose controls bounding box. This box will represent either the native controls or be overridden by users with the custom control bounding box.

Ultimately, this feature request will likely need to land in HTML itself, with potentially some changes in WebVTT to accommodate it.

@eric-carlson @jernoble @alastor0325 would love your feedback on this. Thanks!

@chrisn are you able to help figure out who we can ask to look at this at Chrome? Thanks!

The browsers have different behavior around this. Should the spec define a consistent behavior?

It definitely sounds good if we can define a consistent behavior across browsers, which is something we would be glad to follow.

A lot of video players nowadays do not use native controls, but try to rely on native caption support. How can we better support this use case?

If this is the case, allowing video element to know the size of customized control sounds like a good idea. Is it like using DOMRect to define a bounding box? Does the JS player have to update that bounding box correctly whenever their own control appears or disappears? If video gets resized, does the bounding box get resized automatically as well?

I also notice that there are also some websites using their own controller and rendering for cues. Eg. Youtube. If we can make a consensus to the consistent behavior, does that mean we should also ask those kinds of JS players to follow the behavior as well? CCed @gijsk, our video control expert, to see if he has any thought as well.

that there are also some websites using their own controller and rendering for cues. Eg. Youtube

If the player is rendering both cues and controls themselves, then it's a bit separate from this. The main issue here is about mixed rendering where one of the pieces, either the controls or the cues, are rendered by the browser and then the other piece is rendered by the player, but not both.

CCed @gijsk, our video control expert, to see if he has any thought as well.

I am duly flattered but I suspect @mikeconley and/or some of the folks working on our picture in picture work are a better bet here.

Hi all,

I'm a software developer at Mozilla working on Firefox Desktop. I've worked on Firefox's native video controls and also the built-in Picture-in-Picture feature.

I also notice that there are also some websites using their own controller and rendering for cues. Eg. Youtube. If we can make a consensus to the consistent behavior, does that mean we should also ask those kinds of JS players to follow the behavior as well?

From our experience adding subtitle and caption support for Firefox's Picture-in-Picture feature, it seems that most major video-playing websites (at least those considered major in North America) use their own custom rendering for cues. Here is the list of sites that we have custom wrappers for to support captions and subtitles:

BBC iPlayer
DailyMotion
Disney+
HBOMax
Hotstar
Hulu
Netflix
Piped
Amazon Prime Video
Tubi
Voot
The Washington Post
YouTube

It would be truly lovely if we could incentivize them to use WebVTT instead so that we can stop maintaining these site-specific wrappers - although (based on my experience) I would predict non-uniform success in contacting these sites to suggest this change unless we can make a strong case that it reduces their technical burden.

I'm no web-spec designer, but I do worry about the resize / move case of the DOMRect idea: a site would presumably have to constantly update that DOMRect in the event that their controls change for some reason (even if it's just the size of the video changing). Perhaps instead, the site could indicate which elements constitute the controls, and the browser could take on the responsibility of tracking their size and position over time? If we're going to have success selling this idea to site authors in the wild like the ones I listed, I suspect we have to make a case that they can offload work to the browser, rather than take on more work to manage DOMRects.

@mikeconley From a BBC point of view, there is no way we will be using WebVTT for captions and subtitles - it does not meet our requirements in multiple ways, some of which are fixable, others architectural and unfixable.

An alternative would be to provide support for client code to provide DOM fragments for subtitles and captions, to be displayed at specified times, to keep the UA's concerns separate from the content provider's. Safari already has a tech preview experimental version of this (CC @eric-carlson who can describe it better) that can be used as a model. There's still a question about how to resolve positioning clashes with controls, but this can at least form part of the solution, I think.

For those folk for whom WebVTT does meet their requirements, why not render the WebVTT into those same DOM fragments for presentation? I suspect that's what happens in the shadow DOM in most implementations already, anyway.

I have wondered for quite a while whether we standardized the wrong interface with VTT and TTML. If all sites are going to implement captions using Javascript, then the important interface is not the document format those scripts interpret (who cares? the script and format merely need to match, and both are downloaded), but it is the API surface that the script can work on. Should we pay more attention to that?

We need to standardise both the document format interface and the right API surface.

who cares?

It's important for people implementing tooling and procuring products and services, so they have the right level of interoperability and expectation. One of the huge problems in the area of subtitles and captions is the proliferation of document formats, be they standards or not, as over time various newcomers have looked at the problem, underestimated the complexity and generated a new solution that both has its own problems and in some cases is a problem just by its mere existence (that's not a dig at any particular individual format, and I do recognise that in some cases there are genuine attempts to solve problems).

Looked at holistically, subtitle and caption document exchange standards are important for reducing the entry costs for folk wanting to make their media accessible.

But by tying those standards too tightly to the API, we've got ourselves in a pickle.

a site would presumably have to constantly update that DOMRect in the event that their controls change for some reason (even if it's just the size of the video changing).

Yes, a site would need to update this, but presumably it would only happen when the controls are shown and when they are hidden.

Perhaps instead, the site could indicate which elements constitute the controls

This is a reasonable alternative, but only solves the custom controls with native captions problem and doesn't help with less popular native controls and custom captions setups.

I think the question, right now, is more would some kind of mechanism for this be reasonable to add to browsers? The specifics can be ironed out as it progresses through the various WGs.

I have wondered for quite a while whether we standardized the wrong interface with VTT and TTML. If all sites are going to implement captions using Javascript, then the important interface is not the document format those scripts interpret (who cares? the script and format merely need to match, and both are downloaded), but it is the API surface that the script can work on. Should we pay more attention to that?

I do think that a built-in format is still important to have as a base layer. However, having the right primitives in place for building upon so that not everything needs to get build from scratch would be great. For example, one missing item is removeTextTrack.

Webkit's proposal, that @nigelmegitt mentioned above, seems like the logical next step to expanding these primitives.

Ultimately, this specific issue isn't really a WebVTT issue. It only began here because I encountered it in the context of WebVTT, and I wanted some more thoughts/buy in from vendors on it before trying to push it through the Media WG and eventually the HTML spec.

w3c / webvtt

Behavior with controls, particularly non-native controls, overlap #503