w3c / imsc

TTML Profiles for Internet Media Subtitles and Captions (IMSC)
https://w3c.github.io/imsc/
Other
31 stars 17 forks source link

Why exclude hebrew and arabic proportional reference fonts? #237

Closed r12a closed 7 years ago

r12a commented 7 years ago

A. Reference Fonts https://www.w3.org/TR/ttml-imsc1.0.1/#reference-fonts

proportionalSansSerif
All code points specified in B. Recommended Character Sets, excluding the code points defined for Hebrew and Arabic scripts.
Arial or Helvetica or Liberation Sans

Why are codepoints for hebr and arab excluded from the proportional font list? Actually, monospaced fonts are particularly problematic for arabic script text, since it creates an appearance of baseline stretching between narrow glyphs, and can cause difficulty in rendering wide characters elegantly (such as س). So if there was a preference one way or the other, i'd expect it to be biased towards proportional fonts.

nigelmegitt commented 7 years ago

My understanding here is that the text is not meant to exclude Hebrew and Arabic glyphs per se but to exclude them from the definition of reference metrics. I suspect this is because in those languages (and others I guess) the same code point results in multiple variant glyphs dependent for example on the position within a word, so it is not straightforward to tabulate the metrics per code point.

I would suggest that we add a Note explaining that the exclusion applies only to reference font metrics and is not meant to suggest that the code points themselves should not be supported.

r12a commented 7 years ago

Hmm, then this appears to be worse than i thought. Tying that back to

https://w3c.github.io/imsc/imsc1/spec/ttml-ww-profiles.html#reference-fonts-1

The flow of text within a region depends the dimensions and spacing (kerning) between individual glyphs. The following allows, for instance, region extents to be set such that text flows without clipping.

suggests to me that line-breaking is only properly supported for proporitionally-spaced fonts for a few Latin/Greek/Cyrillic languages. That's a much bigger issue, and suggests a very old fashioned, almost archaic, and highly western-biased mindset about how to handle text.

Also, the CJK scripts, which are normally mono-spaced by nature, are not included in the reference fonts either, so presumably line-wrapping isn't supported for them either?

(Note btw the extract above is missing the word 'on' after depends.)

palemieux commented 7 years ago

suggests to me that line-breaking is only properly supported for proporitionally-spaced fonts for a few Latin/Greek/Cyrillic languages.

No. Any combination of character and font family can be used by authors, and line breaking is specified for all such combinations -- using the UAX 14 algorithm.

Processors have to support mandatory metrics only for a smaller set of font family and characters.

nigelmegitt commented 7 years ago

@r12a I think that line is not supposed to be an exhaustive list of the things that the flow of text depends on, just an example. It certainly isn't supposed to exclude anything that is needed for line wrapping. Possibly I have not understood your concern about that text though?

The main thing is the normative text:

a processor shall use a font that generates a glyph sequence whose dimension is substantially identical to the glyph sequence that would have been generated by one of the specified reference fonts.

This is independent of line breaking.

Perhaps the wording you quoted could be written in a more precise way?

r12a commented 7 years ago

Sorry to be so slow in understanding all this. I think some additional explanations in the spec to address these topics would be useful. There were similar questions coming from other members of the i18n WG.

I guess this is just a profile, but my reading of 7.3 seemed to indicate that text flow (which may not involve line wrapping) was only done by counting characters, and i see no mention of UAX 14. Perhaps there needs to be something that explains the relationship between sections 7.2, 7.3, App A and App B, and the main TTML spec(?)

nigelmegitt commented 7 years ago

my reading of 7.3 seemed to indicate that text flow (which may not involve line wrapping) was only done by counting characters, and i see no mention of UAX 14.

@r12a I'm trying to understand this reading - from the words present, what led you to that?

by the way, §7.4 includes a mandatory requirement to support UAX 14 via the #lineBreak-uax14 feature designator:

#lineBreak-uax14 The processor shall implement the #lineBreak-uax14 feature defined in the TT Feature namespace.

So it is there...

r12a commented 7 years ago

wrt linebreak-uax14, ah! - not sure why my search of the document didn't find that.

@r12a I'm trying to understand this reading - from the words present, what led you to that?

Perhaps because of the following text in the section 7.3 Reference Fonts

The following allows, for instance, region extents to be set such that text flows without clipping.

I may be leaping to unwarranted conclusions, but it seemed to indicate that if you want to avoid clipping, you need to use reference fonts.

nigelmegitt commented 7 years ago

I may be leaping to unwarranted conclusions, but it seemed to indicate that if you want to avoid clipping, you need to use reference fonts.

Not necessarily - you could use named fonts that correspond to specific font resources you know will be available at presentation time.

The intent of this, in my understanding, is to solve the general problem that when authoring a document the actual font that will be used at presentation time is not known, particularly when a generic font family name is used. This requirement sets a reasonable expectation of the size of the rendered text (sequence of glyphs) so that other fixed size elements such as regions can be given an appropriate dimension to include all that text. This is a particular issue when content elements cannot grow to fit, or when scrollbars cannot usefully be made available, which is the case when presenting text overlaid on video.

It also allows the author to meet the accessibility requirement to position text to avoid important areas of underlying video.

palemieux commented 7 years ago

The following allows, for instance, region extents to be set such that text flows without clipping.

I may be leaping to unwarranted conclusions, but it seemed to indicate that if you want to avoid clipping, you need to use reference fonts.

Ok. This is not the intent of the sentence and there is not conformance terminology in the sentence that would compel implementation behavior.

I am happy to remove the sentence if it causes more confusion than it helps.

nigelmegitt commented 7 years ago

I agree that sentence could usefully be changed but I would not remove it altogether. I would simply qualify it by appending "when using the generic font family names monospaceSerif or proportionalSansSerif".

palemieux commented 7 years ago

I plan to generate a PR based on https://github.com/w3c/imsc/issues/237#issuecomment-306458020

r12a commented 7 years ago

I suspect this is because in those languages (and others I guess) the same code point results in multiple variant glyphs dependent for example on the position within a word, so it is not straightforward to tabulate the metrics per code point.

I don't believe that this applies for Hebrew. Hebrew isn't a cursive script like Arabic.

Also, most other scripts than the simple alphabetic ones like Latin/Cyrillic/Greek or CJK, glyphs vary based on context. So the list of exclusions needs to be much bigger than just Arabic - the spec should, i'd have thought, at least mention that just using monospaced fonts for Arabic doesn't solve the real problem here. To be honest, I'm worried that the current approach will just reinforce the tendency that already exists to tailor processors just for 'easy' scripts and languages, rather than really making developers aware of the need to consider a world wide audience when they create their technology.

palemieux commented 7 years ago

the spec should, i'd have thought, at least mention that just using monospaced fonts for Arabic doesn't solve the real problem here.

Yes, it would be good to note best practices.

nigelmegitt commented 7 years ago

@r12a I had imagined that HEBREW LETTER FINAL KAF and HEBREW LETTER KAF, for example, were the same code point and the selected glyph would be based on position within the word, but it turns out that they are distinct code points. I stand corrected on the Hebrew point.

r12a commented 7 years ago

Yep, that's a result of legacy technologies and keyboards (and typewriters). Same applies to Greek wrt sigma. It doesn't really apply in the same way to any other scripts i can think of.

(background reading: http://r12a.github.io/scripts/tutorial/part3#word-final and https://r12a.github.io/uniview/?charlist=%CF%83%CF%82)

nigelmegitt commented 7 years ago

Thank you @r12a - those links are extremely useful!