w3c / imsc

TTML Profiles for Internet Media Subtitles and Captions (IMSC)
https://w3c.github.io/imsc/
Other
31 stars 17 forks source link

Superscript/subscript support #583

Open palemieux opened 2 months ago

palemieux commented 2 months ago

IMSC currently does not support superscript/subscript text, which is desirable (if not essential) in some languages.

French, for example, uses superscript to abbreviate ordinal numerals. The same is true in English, albeit perhaps less necessary, e.g., 1st might be as acceptable as 1st.

image

Couple of observations:

CEA 708 has the concept of superscript/subscript (offset pen attribute).

In fact it looks like tts:fontVariant was introduced, at least in part, to match CEA 708 capabilities, although, as mentioned above, it is not clear that it is a perfect match (or at least the mapping to CSS is not ideal).

css-meeting-bot commented 2 months ago

The Timed Text Working Group just discussed Superscript/subscript support w3c/imsc#583, and agreed to the following:

The full IRC log of that discussion <nigel> Subtopic: Superscript/subscript support w3c/imsc#583
<nigel> github: https://github.com/w3c/imsc/issues/583
<cpn> Pierre: This was brought to my attention by a platform that has a presence in France
<cpn> ... There's no way to signal superscript or subscript text. It's an issue in French more than in English for ordinal numbers, where it's better to use superscript
<cpn> ... It's in their style guide as something that should be supported
<cpn> ... I looked into it, and there is a TTML2 font-variant attribute that allows super/subscript glyphs to be selected for a particular font
<cpn> ... The spec says it's derived from the equivalent CSS feature
<cpn> ... It's not a layout feature, it's a glyph-selection feature
<cpn> ... I tried it in CSS, but couldn't find a font that supports it
<cpn> ... So I tried to find where the TTML2 feature came from
<cpn> ... An issue raised 10 years ago, based super/subscript support in CEA 708
<cpn> ... I'm not convinced tts:font-variant is the answer, but I'd like input, so we do it right
<cpn> ... Unicode does have super/subscript characters, but not enough coverage for all ordinals, or not meant to be used that way
<cpn> Nigel: I researched how you'd do this in HTML and CSS. There seem to be two ways:
<cpn> ... The <sup> and <sub> elements, but there's also a CSS vertical-align feature where you can set the baseline of an element
<cpn> ... Parents have a subscript baseline and a superscript baseline and on inline elements you can set to one or other of those
<cpn> ... So there are two ways, I don't know if one is better than the other
<cpn> Pierre: The HTML elements are widely used
<cpn> Nigel: Every browser supports vertical-align too
<cpn> ... You can understand tts:vertical-align being a TTML style attribute, whereas introducing new elements isn't very TTML-ish
<cpn> Pierre: One option, if we decide tts:font-variant isn't great because of it's mapping to CSS font-variant, we could redefine the mapping to something else
<cpn> ... tts:font-variant was introduced to support superscript and subscript
<cpn> Nigel: The CSS font-variant selects glyphs but doesn't change their position, but if you want to change the alignment then you should use vertical-align
<cpn> ... Sounds not ideal to have a TTML style property that does something different to the CSS style of the same name
<cpn> Pierre: I agree, but not sure why we went with that at the time
<cpn> Nigel: A possibility could be to use a font variant
<nigel> s/font variant/font explicitly defined to include glyphs with super/sub font variant forms
<cpn> Pierre: Potential next steps: confirm it's a real issue, think about how to fix
<cpn> Nigel: Yes, and by "real issue" do you mean that there's no workaround
<cpn> Pierre: Yes, but also if there are subtitle guidelines to discourage use of super/subscript
<cpn> ... The fact it's in CEA708 gives us a good reason to support it
<cpn> Nigel: Do you have any input on the accessibility of super/subscript text?
<cpn> Pierre: Yes, the people in France where wondering why they couldn't do it, probably following a guideline for PNG based subtitles
<cpn> ... My sense is they're incentivised to help. Maybe give some time, to after IBC, then think about how to fix?
<cpn> Nigel: Could be a topic for TPAC as well, need to think about things that affect TTML and IMSC together
<cpn> Nigel: Any other thoughts on this?
<cpn> Cyril: I'm enquiring internally on the importance, so will update you
<cpn> Nigel: I don't expect BBC to have any data points
<cpn> ... I could ask the EBU media access technology group. If you're a member, you could ask on their reflector
<cpn> ... It's a good forum for input on non-English European languages
<cpn> Pierre: I can ask there, possibly also on social media
<cpn> Nigel: Thanks
<nigel> SUMMARY: Investigation into requirements to continue, agenda+ for TPAC
skynavga commented 2 months ago

@nigelmegitt @palemieux Note that #derivation-fontVariant says that [font-variation-position]() applies for normal, super, and sub, which in turn says here that:

Because of the semantic nature of subscripts and superscripts, when the value is either ‘sub’ or ‘super’ for a given contiguous run of text, if a variant glyph is not available for all the characters in the run, simulated glyphs should be synthesized for all characters using reduced forms of the glyphs that would be used without this feature applied. This is done per run to avoid a mixture of variant glyphs and synthesized ones that would not align correctly. In the case of OpenType fonts that lack subscript or superscript glyphs for a given character, user agents must synthesize appropriate subscript and superscript glyphs.

palemieux commented 2 months ago

@skynavga in your experience/mind, subscript/superscript is no different than bold and italic, i.e., a different variation of a font for a given character?

palemieux commented 2 months ago

I was also surprised that Chrome did not support font-variation-position: super if it is equivalent to <sup>

https://codepen.io/palemieux/pen/gONjEVx

skynavga commented 2 months ago

@palemieux

in your experience/mind, subscript/superscript is no different than bold and italic, i.e., a different variation of a font for a given character?

Yes, for higher end fonts (where the designer pays attention to these matters). However, many fonts contain a small set of sub/super glyphs without having the corresponding OpenType substitution table, and, indeed, Unicode itself codes a few sub/sup variants as characters in their own right. I don't actually recall seeing an independent font resource for sup or sub or both (in the sense that one encounters bespoke bold, italic, and bold-italic font resources).

A good text layout/rendering engine will take heed of the language I cited from CSS3 Fonts regarding the need to synthesize glyphs on demand. Indeed, I used this approach to synthesize ruby glyphs in ttt/ttt-ttpe.

palemieux commented 2 months ago

@skynavga Any reason why font-variation-position: super is not supported in Chrome, but <sup> is? Could we say, in TTML2, that tts:fontVariant="super" is the same as <sup>?

skynavga commented 2 months ago

@skynavga Any reason why font-variation-position: super is not supported in Chrome, but <sup> is? Could we say, in TTML2, that tts:fontVariant="super" is the same as <sup>?

@palemieux Couldn't say. Haven't been connected to the Chromium project in ages (is it still a thing?). I wouldn't want TTML2 to make a reference to HTML5 to obtain the semantics of <sup>. I think the semantics described in CSS3 Fonts are just fine, as it gives plenty of latitude for implementation behavior. Note, however, that TTML2 doesn't normatively use the CSS3 Fonts semantics in this context, but, rather, it is merely an indirect reference as such. Think of it as a hint.

palemieux commented 2 months ago

See discussion at https://github.com/w3c/csswg-drafts/issues/7441

font-variant-position is a bit useless - as it may or may not generate superscript, everyone just uses font-size and vertical-align

skynavga commented 2 months ago

Ok. I view this as an implementation matter. TTML2 doesn't actually reference font-variant-position in a normative manner. So it is up to a presentation processor's implementation to decide how to render tts:fontVariant, whether it be custom rendering code, mapping to font-variant-position, font-size with vertical-align, or <sup>, etc.

Notwithstanding this, I agree that the current normative language defining tts:fontVariant focuses only on glyph selection without mentioning fallback strategies in the absence of selectable glyphs. I wouldn't object to a Note being added (to §10.2.23) that elaborates potential fallbacks, which I would probably write in terms similar to that found in CSS3 Fonts as cited above.

nigelmegitt commented 2 months ago

For the use case where both a superscript and subscript are needed (chemical symbols etc), vertically above each other, I think font-variant-position does not work, but Firefox at least does seem to do what CSS says and synthesise superscript and subscript for ordinary letter-like symbols.

I created a codepen to demonstrate different HTML and CSS techniques for superscript and subscript. In many ways the easiest to implement is the one using table layout, though I haven't worked out a more elegant way to get only superscripts to work than putting a non-breaking space character into a <span> so that there's something occupying the subscript row, which pushes the superscript row up to the top.

Firefox rendering: image

css-meeting-bot commented 2 months ago

The Timed Text Working Group just discussed IMSC superscript/subscript support w3c/imsc#583, and agreed to the following:

The full IRC log of that discussion <nigel> Topic: IMSC superscript/subscript support w3c/imsc#583
<nigel> github: https://github.com/w3c/imsc/issues/583
<nigel> Nigel: This was discussed in the EBU Media Access Technology group call last Tuesday,
<nigel> .. and the main summary is that for French especially, if the feature were available it would be used,
<nigel> .. but French people are used to non-superscript ordinals at the moment.
<nigel> .. There are other similar use cases for numbers in chemical symbols or in "square metres" etc,
<nigel> .. in Norwegian and German.
<nigel> SUMMARY: EBU positive about requirement, unclear if anyone "cannot live without" the feature.