APA comment: presentation customisation

From https://lists.w3.org/Archives/Public/public-tt/2018Jan/0270.html

There is a general issue with the way that an author specifies layout characteristics of captions and subtitles, such as font size, font family, line height, background and positioning. The spec describes the approach of the author specifying a “fixed layout” for captions and subtitles that the user cannot change. However, it must be possible for the user to overwrite the author’s choice of font size, or background color, for example. This is necessary for accessibility reasons, in the same way that browsers allow the user to change font size and background color. How can we find a good solution for these conflicting interests between author and user? We would like to get into a discussion with you on this issue.

The spec describes the approach of the author specifying a “fixed layout” for captions and subtitles that the user cannot change.

The specification does not prevent processors from modifying the layout and other aspects of the presentation. Where do you see such a restriction?

I believe the spec does define how a conformant processor should behave, however in the real world there is no requirement against processors being deliberately non-conformant against the presentation semantics for the purpose of providing customisation options.

however in the real world there is no requirement against processors being deliberately non-conformant against the presentation semantics for the purpose of providing customisation options.

There are in fact regulatory requirements in some countries that mandate that systems allows users to modify the presentation of captions in specific ways.

A system that receives IMSC1.1 timed text knows how the author intended the timed text to be presented, but can modify that presentation according to regulatory requirements and/or user preferences. This is no different than televisions offering zoom features to fill a 16:9 display with 4:3 content.

A system that receives IMSC1.1 timed text knows how the author intended the timed text to be presented, but can modify that presentation according to regulatory requirements and/or user preferences.

Thanks @palemieux although this fact is not wholly apparent in the Draft spec.

Is it possible to include some language to that effect (perhaps an advisory note), as well as a warning that authors should not attempt to restrict this functionality because of accessibility needs?

(An added bonus would include some indication of a technique to achieve this, as well as suggestions/recommendations on how to meet the following Draft Requirement in WCAG 2.1 (https://www.w3.org/TR/2018/CR-WCAG21-20180130/#text-spacing):

Success Criterion 1.4.12 Text Spacing (Level AA) In content implemented using markup languages that support the following text style properties, no loss of content or functionality occurs by setting all of the following and by changing no other style property:

Line height (line spacing) to at least 1.5 times the font size;
Spacing following paragraphs to at least 2 times the font size;
Letter spacing (tracking) to at least 0.12 times the font size;
Word spacing to at least 0.16 times the font size.

Exception: Human languages and scripts which do not make use of one or more of these text style properties in written text can conform using only the properties that are used.)

Is it possible to include some language to that effect (perhaps an advisory note), as well as a warning that authors should not attempt to restrict this functionality because of accessibility needs?

@johnfoliot It is simply impossible for the author to prevent the consumer system from modifying the presentation, just as it is impossible for authors to prevent a TV from zooming a 4:3 image to make it fit a 16:9 display. I am not sure a note that would say "consumer system may modify this at will" will really help.

as well as suggestions/recommendations on how to meet the following Draft Requirement in WCAG 2.1 (https://www.w3.org/TR/2018/CR-WCAG21-20180130/#text-spacing

What is the timeline for publication of this specification?

Success Criterion 1.4.12 Text Spacing

Was subtitling/captioning taken into account when drafting these recommendations?

What is the timeline for publication of this specification?

It entered Candidate Recommendation today :) , with a target Recommendation date of June 2018

Was subtitling/captioning taken into account when drafting these recommendations?

To the extent that Subject Matter Expertise within our group could. The WG also referenced and was informed by the MAUR (Media Accessibility User Requirements), and specifically VP-2 (https://www.w3.org/TR/media-accessibility-reqs/#requirements-on-the-use-of-the-viewport)

It is our current understanding that Timed Text (whether TTML or WebVTT) had the ability to be re-styled by the end user (in user-agents that support that function - i.e. web browsers, etc.). WCAG recognizes the limitations of some systems however, which is why there is also a provision for Accessibility Supported technology: https://www.w3.org/TR/2018/CR-WCAG21-20180130/#dfn-accessibility-supported

If you (or other members of your WG) are seeing a technology collision here that is currently insurmountable, please do provide more details. While the Rec is now in CR, it's still not too late to provide feedback which the WG will need to address as part of the Exit Criteria. Our WG is also using GitHub: https://github.com/w3c/wcag21/issues/new

how to meet the following Draft Requirement in WCAG 2.1

@johnfoliot I think you've already raised this, but wouldn't it make more sense to refer to the MAUR here?

a technology collision here that is currently insurmountable

Certainly there is a collision, whether you think it's technology based or not is perhaps in the eye of the beholder. The issue for subtitles and captions is that an approach that might in some circumstances increase accessibility, i.e. increasing any of the text spacing properties you listed, is in this scenario of subtitles and captions likely to obscure a larger area of the video, and possibly to introduce wrapping or overflow scenarios making that situation even worse. When the viewport is very constrained, and it is important for the user accessing the overall programme content to be able to see as much as possible of the underlying video media (and certainly some key parts such as mouths, on screen text etc), those modifications have a high chance of making the programme less accessible rather than more.

I think there is never going to be a right answer that satisfies everyone, but saying "no loss of content or functionality occurs" is certainly impossible since some of the content and functionality lies outside (specifically, behind) the text. Treating the text in isolation in this use case is the fault I think.

However, it must be possible for the user to overwrite the author’s choice of font size, or background color, for example. This is necessary for accessibility reasons, in the same way that browsers allow the user to change font size and background color. How can we find a good solution for these conflicting interests between author and user? We would like to get into a discussion with you on this issue.

This is a very tricky requirement to navigate from an implementation perspective. There are many approaches in use, and none of them seems completely satisfactory to me - we should continue to look for a better way. Amongst the conflicting requirements:

Need for privacy - avoid user settings being fingerprinting vectors
Need for ease of use - allow user to change settings directly in the player
Need for system defaults - allow user to change default settings system-wide
Prevalence of sites implementing their own subtitle players rather than using native players
Need for support in native players for all the formats that are needed
In the absence of native player support for all formats, the need for an extensible model for player implementations that can take into account user settings made locally or in the system defaults without those settings being exposed

(this list is probably incomplete!)

These are format agnostic requirements. It is clear that a model of "you get all the goodies if you use our preferred format X" has failed in the marketplace because there is not widespread or universal agreement (yet) on an X that works for everyone.

> Treating the text in isolation in this use case is the fault I think.

It depends on the end-user, and their individual needs. I fear this may not be the appropriate place, but here goes... :)

Might I suggest that there are some additional data points here worth considering:

As previously noted, WCAG has this notion of Accessibility Supported https://www.w3.org/TR/UNDERSTANDING-WCAG20/conformance.html#uc-accessibility-support-head, which in essence says if the technology supports an accommodation method, authors need to align to that (i.e. do nothing to frustrate the accomodation method). Part of that concept states:

When new technologies are introduced, two things must happen in order for people using assistive technologies to be able to access them. First, the technologies must be designed in a way that user agents including assistive technologies could access all the information they need to present the content to the user. Secondly, the user agents and assistive technologies may need to be redesigned or modified to be able to actually work with these new technologies.

So, for example, we have a requirement that effectively states "Do not disable Pinch-to-zoom" on mobile devices - obviously out-of-scope for desktops (at least those that do not have a pinch-to-zoom display), but a critical requirement on those devices that do support pinch-to-zoom.

Most importantly today however is that we are focused on the first part of that quoted section: "... the technologies must be designed in a way that ..."

Additionally, while I can appreciate your concern over suitable display 'real-estate', may I also resurface an old axiom often heard around the W3C: "author proposes, user disposes"? One of the key things to remember about "accomodation" is that the end-user traditionally understands there is a trade-off, so what they really need/want is the ability to decide for themselves where their comfort level is in that trade-off. You may cringe at the thought of captions taking up 60% of the viewport, but for some users, that may be what they both want and need, and so who should "win" that choice?

Real-estate is also a concern directly linked to the actual physical size of the device: obviously this will be a greater concern on an iPhone than on a 55" big-screen TV, yet we need to remain somewhat agnostic to that fact, as we simply do not know how or what the end user will be using to access the video.

But I also urge you to think a bit out-of-the-box.

We have discussed, with both this group as well as WebVTT, the concept that for "TV" and other forms of large-screen/dumb-terminal displays, we recognize the technical limitations of those technologies around limitations in viewport size, etc. But additionally, we have been following activities related to other "TV"[sic] work at the W3C, such as the Second Screen activity. And so, to your concern Nigel, a few "alternative" ways of looking at this:

Scenario 1: Using a Second Screen. The family gathers in the great room to watch a movie. Gramps, who is extremely hard of hearing, has a tablet which he uses at the same time as the family is watching the 55" Big Screen TV. Through a setting on his set-top box, the captions (aka time-stamped file) are piped to his tablet, where he can either pinch to zoom, or otherwise increase the text to a readable level on the tablet, allowing him to follow along, without the whole family having to see the captions on the big screen. While we've yet to see this type of accomodation emerge to-date (at least commercially, that I am aware of), it is our understanding that the technologies and means to do this are pretty much in hand today, so it's not so much a matter of if, but rather when. Would you agree?

Scenario 2: Picture in Picture. Earlier this week, the US President gave his State Of The Union address, which was broadcast live across many networks. However, if you think about it, although this was broadcast on TV, the key point was not watching the speaker speak, it was the content of the speech, the audio track, that was most important. Imagine a setting where, instead of overlaying the captions (z-index style [sic]) on top of the video feed, that instead the 2 feeds could be piped to (essentially) two screens again: render the text (enlarged as much as possible/desired) in the main viewport, and then send the video feed of the speaker to the smaller, picture-in-picture region. Once again, while I am unaware of any system that is doing this today, it is our understanding that there is nothing with the current technology that would frustrate the ability to do that. Would you agree to that as well?

And so, two out-of-the-box scenarios that aren't total science-fiction, just perhaps "forward thinking". Our desire then is that there is nothing in your spec that would frustrate the ability to do this with a time-stamped mark-up file; that yes, the technology CAN support these ideas.

We fully recognize that this will have what many will perceive as an adverse visual outcome, but we wish to assure you that that alone isn't the deal-breaker. Rather, locking down captions so that they cannot be enlarged to a readable size for the user is far more significant and troubling (and, Nigel, with EN 301 459 looking to adopt WCAG 2.1 as their benchmark), this may be cause for regulatory concern.

So... putting aside 'display' concerns, will users be able to enlarge caption text going forward, and can the spec leave open the possibility that on some systems, users will be able to find an accomodation spot that suits their needs?

To Pierre-Anthony's comment: "I am not sure a note that would say "consumer system may modify this at will" will really help."

I don't think that is what we are seeking. Rather, an authoring note that suggests that author-supplied "rendering instructions" may be overridden by the end user, and so authors should avoid making assumptions about how the final rendering will look in all instances.

Think of it more in keeping with the Responsive Web thinking; that you never will know what the actual viewport size is for the web-content, so again, don't lock things down that could have a negative display outcome. As an example, content authors should avoid inserting hard breaks in the rendering, as it would introduce weird rendering if and when the text/font was enlarged (instead, the text should 'reflow' in the dedicated caption region, and ideally that region itself can "grow" to accomodate enlarged text).

To your list Nigel:

Need for privacy - avoid user settings being fingerprinting vectors [JF] Agreed, but since this is a user-setting or configuration on the end device, it is unclear how the time-stamped text file could communicate sensitive data back to the source (not saying that some black-hat won't try, but...)
Need for ease of use - allow user to change settings directly in the player [JF] Agreed, although again, I do not see this as a problem with the current spec under discussion. Additionally, different players may have different solutions (see my above), and so as long as there is nothing that frustrates or forbids alternate renderings there should be no issue. The set-up to consume those alternatives, while indeed needing to be "simple", is out of scope here.
Need for system defaults - allow user to change default settings system-wide [JF] Agreed, see bullet 2
Prevalence of sites implementing their own subtitle players rather than using native players [JF] True enough, but that is a hardware concern as well. As long as the content being consumed by the player has the appropriate 'hooks' or ability to be rendered differently than how the author first proposed, then the goal is being met. What we want (need) is that the ability for the end-user to make those kinds of text-rendering adjustments is not frustrated by the spec.
Need for support in native players for all the formats that are needed [JF] Once again, a hardware concern. One of the import things to also consider is that as WCAG is taken up by legislators around the planet, the specifications there also drive development: it may not be the ideal way to address the chicken and egg problem ("we don't have any players today that can do this"), but is has proven effective. So again, as long as this can be supported, we can wait for the rest of the ecosystem to mature.
In the absence of native player support for all formats, the need for an extensible model for player implementations that can take into account user settings made locally or in the system defaults without those settings being exposed [JF] Correct, and initially, I am expecting to see scripted solutions come forth. Nigel, I do not share the same concern you seem to have about exposing user-settings (not that this isn't a concern, just I am failing to see how that would be a timed-text format concern).

At any rate, this is more of an email than a "issue" response, and if there is a need or desire to get your WG and APA together for a chat (teleconference) we can certainly look to arrange such.

Please advise.

On Wed, Jan 31, 2018 at 5:43 AM, Nigel Megitt notifications@github.com wrote:

However, it must be possible for the user to overwrite the author’s choice of font size, or background color, for example. This is necessary for accessibility reasons, in the same way that browsers allow the user to change font size and background color. How can we find a good solution for these conflicting interests between author and user? We would like to get into a discussion with you on this issue.

This is a very tricky requirement to navigate from an implementation perspective. There are many approaches in use, and none of them seems completely satisfactory to me - we should continue to look for a better way. Amongst the conflicting requirements:

Need for privacy - avoid user settings being fingerprinting vectors

Need for ease of use - allow user to change settings directly in the player

Need for system defaults - allow user to change default settings system-wide

Prevalence of sites implementing their own subtitle players rather than using native players

Need for support in native players for all the formats that are needed

In the absence of native player support for all formats, the need for an extensible model for player implementations that can take into account user settings made locally or in the system defaults without those settings being exposed

(this list is probably incomplete!)

These are format agnostic requirements. It is clear that a model of "you get all the goodies if you use our preferred format X" has failed in the marketplace because there is not widespread or universal agreement (yet) on an X that works for everyone.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/w3c/imsc/issues/316#issuecomment-361908217, or mute the thread https://github.com/notifications/unsubscribe-auth/ABK-c2YNV8dFopPtim40KIoYMwOzU_V6ks5tQFH7gaJpZM4RygnY .

-- John Foliot Principal Accessibility Strategist Deque Systems Inc. john.foliot@deque.com

Advancing the mission of digital accessibility and inclusion

Hi John, thanks for the long and thoughtful response.

There are some points that I would certainly debate further, but the main point I would make is that the specification defines how a conforming presentation processor will process the author's instructions to display the text at the right times. By definition, if a regime exists in which the user overrides the author's instructions for that user's own purposes, we are out of scope of the specification.

The information about the text and the times is certainly present, and such a deliberately non-conforming processor is free to do what it wishes with any of the other presentation-affecting markup in the subtitle document. If we attempted to be prescriptive about how that should work, we would never cover all the scenarios, including those that you have highlighted and those that have not yet been thought of.

Does this need saying in the specification? I'm not sure how useful it is.

Now to your questions and some points that I would like to respond to:

... You may cringe at the thought of captions taking up 60% of the viewport, but for some users, that may be what they both want and need, and so who should "win" that choice?

Nobody is winning here. It's not that I'm cringing about the size of the captions, it is that I know from experience that the viewer is watching the whole programme not just reading the captions. You simply cannot take the captions in isolation when considering accessibility of AV content, at least in the most overwhelmingly common current usage. Also, as above, I don't think the spec says anything about the user overriding authored settings.

Real-estate is also a concern directly linked to the actual physical size of the device

From my research, I would say that this link is far from straightforward, especially in the common case of full screen viewing.

Scenario 1: Using a Second Screen. ... it's not so much a matter of if, but rather when. Would you agree?

No, I would not. All the evidence I have seen is that moving the captions further from the image of the people saying the words, or generally from the video content, impairs understanding, enjoyment and overall accessibility. I do not doubt that there are special cases where this scenario might work well for some people but I have seen no evidence at all from anywhere to suggest that it is something that people want to do or are likely to begin doing soon. I'm always keen to learn new things though!

Scenario 2: Picture in Picture. ... there is nothing with the current technology that would frustrate the ability to do that. Would you agree to that as well?

I would agree - there's nothing to prevent that. Is this comment targeted at IMSC? Is there a change that needs to be made to facilitate it in some way?

Our desire then is that there is nothing in your spec that would frustrate the ability to do this with a time-stamped mark-up file

Okay. Without wishing to be too confrontational, I'm not sure if your desire is satisfied! It seems not, but maybe this is a question of scope of the specification.

As an example, content authors should avoid inserting hard breaks in the rendering, as it would introduce weird rendering if and when the text/font was enlarged (instead, the text should 'reflow' in the dedicated caption region, and ideally that region itself can "grow" to accomodate enlarged text).

Perhaps the answer to this is that an enlarging presentation processor might treat any line breaks inserted as somewhat optional. This is certainly an implementation matter.

... content authors should avoid inserting hard breaks in the rendering ...

For the general case of authored subtitles and captions there is certainly research (even some quite recent academic research I came across in November, which I cannot cite until it is published) showing that correctly positioned line and subtitle breaks (in relation to the grammar of the sentence) make a significant difference to how easily people can read and understand the text. I was unaware of this independent research until I saw it presented, and as it happened the author had taken as a hypothesis the BBC's subtitle guidelines regarding line breaks and validated those guidelines.

My conclusion is rather strongly to say that authors should insert breaks in the rendering, because it makes the text more accessible in the majority case. As I said above, implementations that work against a different constraint set may choose to process those line breaks in out-of-spec ways, and that might be entirely reasonable.

will users be able to enlarge caption text going forward, and can the spec leave open the possibility that on some systems, users will be able to find an accomodation spot that suits their needs?

It might be ducking the question, but my view is you need to ask a wider set of implementers about this. If implementers wish to implement something that deliberately plays back the subtitle documents differently to the "locked down" interpretation of those documents defined by the specification, then users will have that option. The HBB4ALL project investigated exactly this, and provided users with the ability to present TTML based subtitles with customisation options beyond those defined in the specification.

I would go further though - in practice most real world implementations vary somewhat from the specification to some degree. It's just that they don't all vary in the same way, so there is nothing to standardise at this time. I take this as evidence that the possibility is open for some systems to provide accommodations for particular needs.

an authoring note that suggests that author-supplied "rendering instructions" may be overridden by the end user, and so authors should avoid making assumptions about how the final rendering will look in all instances.

Authors must make some assumptions, but I do not know of any who think they know how all the final renderings will look in all instances. I'm not even sure why they would expect to know this. They do need to know roughly how big the subtitles will be and where they will be positioned, and sometimes there is a surprisingly small window in which to place the text. If users generally expect to be able to change the text presentation arbitrarily without loss of meaning of the whole programme including the video, they are frankly wrong in many cases. Some transformations are much safer than others: for example a practice of authoring to a large size and allowing the user to shrink the text is much safer in this respect than the other way around.

you never will know what the actual viewport size is for the web-content

The situation is certainly different here: you have a pretty good idea what it is for video content, notwithstanding the additional scenarios you proposed.

On my list of general requirements, I intended that as food for thought for tackling this general thorny issue in the medium term, rather than with any intent to modify IMSC 1.1 in the short term.

The privacy issue is not one for the subtitle document or the document format - it is that, were the user agent to make available the user's caption presentation preferences, say by an API, it would provide fingerprinting data to any script running on that UA. This is why some implementations go to great lengths to hide the settings and the final rendering from scripts.

if there is a need or desire to get your WG and APA together for a chat (teleconference) we can certainly look to arrange such

Judging from this thread, I suspect that would indeed be helpful, and would be happy to participate and assist with the arrangements.

Thanks again!

Proposed disposition: defer to IMSCvNext and WR-resolved-partial

Judging from this thread, I suspect that would indeed be helpful, and would be happy to participate and assist with the arrangements.

While ambitious, worldwide alignment on guidelines for user-driven customization would be good.

The Working Group just discussed APA comment: presentation customisation imsc#316, and agreed to the following resolutions:

SUMMARY: @palemieux to prepare a pull request adding reference to MAUR.

The full IRC log of that discussion

<nigel> Topic: APA comment: presentation customisation imsc#316
<nigel> github: https://github.com/w3c/imsc/issues/316
<nigel> Nigel: This is what we largely discussed yesterday.
<nigel> -> https://www.w3.org/2018/02/28-apa-minutes.html Minutes from yesteday's call
<nigel> Nigel: Specifically, call out: "JF agrees with the proposition that an informative note citing this MAUR document would improve the caption-related specifications."
<nigel> Nigel: We could do that by modifying appendix D to include MAUR considerations additionally.
<nigel> Pierre: Sounds good to me.
<nigel> .. I'll prepare a pull request.
<nigel> SUMMARY: @palemieux to prepare a pull request adding reference to MAUR.

w3c / imsc

APA comment: presentation customisation #316