w3c / imsc

TTML Profiles for Internet Media Subtitles and Captions (IMSC)
https://w3c.github.io/imsc/
Other
31 stars 17 forks source link

Support `#font` TTML2 feature #472

Closed nigelmegitt closed 5 years ago

nigelmegitt commented 5 years ago

Currently the #font TTML2 feature is not listed in IMSC 1.1. This means that its use is prohibited. This means that fonts can not be embedded or referenced by explicit URL.

One of the proposed approaches to w3c/tt-reqs#15 is to use private use areas of the Unicode range and a font that defines glyphs for the code points used. To do that we would almost certainly a need to support #font since the only reliable way to make use of private use area code points is to reference the font that defines the glyphs explicitly, or to embed it.

palemieux commented 5 years ago

A key question in my mind is whether support for font files embedded within the IMSC document, i.e. embedded content resource, is required. This can yield to unwieldy XML documents very rapidly, and most, if not all, containers formats, e.g. MXF and ISO BMFF, support embedding of resources.

nigelmegitt commented 5 years ago

If the requirement is to support glyphs defined in private use areas then embedded font support would also be needed, in my view, because it is the only generally reliable way to ensure that the private use area codes map to the right glyphs. There may be private schemes that can be constructed, but they would likely be brittle across time and between organisations.

I can see that this could be used in an unfortunate way as you describe @palemieux , for non-private use area fonts, but there would be no document constraint forcing the document author to embed the font, so this is a matter of the practices we need to support rather than impose.

skynavga commented 5 years ago

You should be aware that when font embedding is used, it is most often the case that the font is a subset font; that is, it only contains the glyphs necessary to render the containing document. This significantly reduces the size of the embedding, particularly w.r.t. CJK fonts. So it may not be as unwieldy as you imagine.

palemieux commented 5 years ago

it is most often the case that the font is a subset font;

Yes, I can see this use case being useful, and of reasonable impact on the size/complexity of the document.

Perhaps the HRM can simply limit the size of embedded resources.

Is there a readily available example of what a subset font resource looks like?

skynavga commented 5 years ago

It "a subset font resource" is identical to a fully populated font resource, but has only those glyphs (and related table entries) that apply to a subset of characters supported by the full (non-subset) font. There is no change in format or syntax of the subset font.

cconcolato commented 5 years ago

@nigelmegitt I'm not convinced the use of private use areas is the right way to go, I would prefer using readable characters for accessibility/fallback reasons, but I do support the idea of adding support to the font element in IMSC1.2.

I also agree with @palemieux that we need to see if we want to allow referencing and/or embedding.

That said, we also need to think about the restrictions that we want to put in terms of syntax. For example, for referencing, there are so many ways to do it: <font src="file.otf"/> vs. <font><source src="file.otf"/></font> vs. <font><source><data src="file.otf"/></source></font> not even mentioning the use of fragment identifiers, which can point to source or data elements themselves pointing to external resource ...

nigelmegitt commented 5 years ago

I'm not convinced the use of private use areas is the right way to go

@cconcolato I agree - I am not convinced either, but think it should be an available option.

I also agree we should consider narrowing the permitted syntax for expressing this.

palemieux commented 5 years ago

There are a number of unresolved issues:

nigelmegitt commented 5 years ago

This is probably a good topic to discuss at TPAC - I've added the agenda label.

palemieux commented 5 years ago

As a strawman, I propose the following limits to font resources in IMSC 1.2: no more than 2 font resources, each no larger than 10 MB.

This doubles the limits imposed on D-Cinema subtitles.

css-meeting-bot commented 5 years ago

The Timed Text Working Group just discussed Support `#font` TTML2 feature #imsc472, and agreed to the following:

The full IRC log of that discussion <nigel> Topic: Support `#font` TTML2 feature #imsc472
<glenn> s/merging the properties/merging voice-rate semantics into tta:speak/
<nigel> github: https://github.com/w3c/imsc/issues/472
<atai> Nigel: It is about font feature in IMSC 1.1
<glenn> s/I thought auto is display none/.../
<atai> ...Pierre question is about ressource limit
<atai> pal: questions:
<atai> ...should processors support a minimum set of font formats
<atai> ....should @type be limited to a certain set of values?
<atai> ...should the number of resources be limited in a document?
<atai> ...should the size (in bytes) of each resource be limited?
<atai> ...regarding should @type be limited to a certain set of values?
<atai> ...my recommendation is to not require to constraint @type but the browsers need to support a minimum list of font ressources
<atai> ...for the start of the spec the limit for the number of ressources my proposal is 2
<atai> Nigel: Why should we limit?
<atai> pal: Download time
<atai> cyril: we could make a difference between the number of ressources in the document
<atai> ...and the number that should be loaded at begin time
<atai> pal: do we have a strategy of effective font loading in ttml
<atai> glenn: we have a strategy and named it
<atai> ...lazy loading in our discussion
<atai> ...it is an implemenation dependent feature
<atai> Nigel: Number of number font elements can be restricted
<atai> ...you expect each one font element just one font ressource to be loaded
<atai> ...if you want to restrict it you should restrict number of font element
<atai> glenn: I would not like to constraint neither font element nor ressources
<atai> ...the application can decide on the basis the referenced font information what to fetch
<atai> pal: if you use fetch mechanism you can make the limitation bytes
<atai> ...the downside of fetch it requires full processing of the document
<atai> ...full processing like styke resolution etc.
<atai> glenn: This is a constraint you can only test during presentation processing
<atai> glenn: coult it be a constraint in the HRM
<atai> pal: yes
<atai> ...as info: digital cinema sets fetch limit to 10 MB
<atai> ...spoke with adobe colleague
<atai> ...this is no coincidence
<atai> ...it just works there
<atai> atsushi: In Japan we provide only a subset of a font
<atai> ...this limits the size
<atai> cyril: I don't think that we at netflix we would do font subsetting
<atai> ...especially for the first episode you have to provide all
<atai> glenn: noto sans font 8.6 mb for simplfied chinese
<atai> gkatsev: We can start with 10 MB and then see if anybody is complaining
<atai> ...in that case we increase the limit
<atai> Pal: everybody agrees to have 10 MB as limit
<nigel> PROPOSAL: For FPWD limit fetched font resource to 10 megabytes
<nigel> Nigel: Any objections?
<nigel> RESOLUTION: For FPWD limit fetched font resource to 10 megabytes
<atai> pal: Constraint on @type
<atai> ...my suggestion no constraint
<atai> ...but require IMSC to support minimum set of font formats
<atai> PROPOSAL: no constraint on @type but IMSC processors need to support minimum set of font formats
<atai> RESOLUTION: No constraint on @type but IMSC processors need to support minimum set of font formats
<atai> pal: But...which font format?
<nigel> i/RESOLUTION: Nigel: Any objections?
<atai> ...we need to be careful
<nigel> i/RESOLUTION: /Nigel: Any objections?
<nigel> s|i/RESOLUTION: Nigel: Any objections?||
<atai> ...there are couple of formats woff woff2...
<nigel> i/RESOLUTION/group: [no objections]
<atai> ...I have not the expertise to decide what is hard not hard
<atai> cyril: We need one compressed and one uncompressed format
<atai> Nigel: There is an example in the DVB spec
<cyril> https://caniuse.com/#search=woff
<glenn> OTF's SVG table defined at https://docs.microsoft.com/en-us/typography/opentype/spec/svg
<atai> ...font download it supports OFF (Open font format) and WOFF
<atai> ...that is where I would start
<atai> cyril: we can discuss about woff2
<atai> ...it is broadly supported
<atai> ...and compresses better
<nigel> andreas: It would be good to get a view from Vladimir Levantovsky from Monotype who is a member of this group too.
<atai> pal: we wanted to specify requirement of processorts
<atai> s/processorts/processors/
<atai> cyril: we should constraint it to not support svg-outline
<nigel> PROPOSAL: Require minimally processor support for font/otf with cff and ttf (i.e. no svg outline) plus woff
<atai> Nigel: Any objections?
<atai> RESOULTION: Require minimally processor support for font/otf with cff and ttf (i.e. no svg outline) plus woff
<nigel> s/RESOULTION/RESOLUTION
<atai> gkatsev: woff2 has 30% better compression (as a data point)
<atai> cyril: about unicode range
<atai> ...I am satisfied with the current solution in Pierres PR
<atai> nigel: I have an example with two fonts and different sources
<atai> ...but overlapping sources
<atai> ...should we constraint this?
<atai> ...is there a use case for this?
<atai> glenn: Usually we leave it to the implementation what make sense
<atai> pal: In this case we constraint the size of fetch ressource
<atai> ...are we asking the implementation to find out the minimum set need
<atai> ...seems complicated
<atai> glenn: we have font slelection strategy in in TTML2
<atai> nigel: this is orthogonal to this discussion
<atai> pal: what we can do
<atai> ...we can forbid different fonts with the same font family
<atai> nigel: we may have different fonts and ressources for the same font family because they are for different faces
<atai> ...e.g. bold, italics ...
<atai> Nigel: we can try to constraint font family together with different properties like weight
<atai> pal: how do unicode range goes together with fetch
<atai> ...it needs to be validated by the processor when the size is exceeded
<atai> ...the typical usecase is one font and one font family, right?
<atai> Nigel:
<atai> atai: NPO in Netherlands have the requirement to have two fonts in one document (one for Arabic and one for Dutch version of the subtitle)
<atai> pal: even if you have two fonts it make sense
<atai> ...to constraint the combination for ranges, family, stylee and weight
<atai> s/stylee/style/
<atai> Nigel: I see two propsoals
<atai> ...you forbid to have unicode range overlab between different font elements with the same values for font famliy, style and weight
<atai> ...or to have no constraint
<atai> glenn: There may have identical font elements with different kerning tables
<atai> pal: you can forbid that
<glenn> https://www.w3.org/TR/2018/REC-css-fonts-3-20180920/#font-style-matching
<atai> glenn: I would recommend that implementation follow the algorithm defined by css
<atai> the HRM should use this algorithm
<atai> pal: this would solve that issue
<atai> glenn: ttv does some (not all) HRM checking
<atai> nigel: You can not specifiy the HRM constraint
<atai> ...you can not statically validate this as font resssouces may change dynamically
<atai> pal: idea of hrm is to provide basic guidlines so things are going to work
<glenn> see https://www.w3.org/TR/2018/REC-css-fonts-3-20180920/#composite-fonts for text on handling overlapping ranges
<atsushi> s/overlab/overlap/
<atai> PROPOSAL: Unicode range overlab between different font elements are permitted even they have identical values for style, family and weight
<atai> Nigel: Any objection?
<atai> No objections
<atai> Resolution: Unicode range overlap between different font elements are permitted even they have identical values for style, family and weight
lcollinsnflx commented 5 years ago

I have a comment about this:

8.4 Font Resources 
 A  Presentation Processor  SHALL  support  font resources  of the following content types: 

 font/otf  with TrueType or CFF glyph data; or 
 font/woff  with TrueType or CFF glyph data. 

Even though the comment for "otf" says TrueType or CFF glyph data, "otf" looks like a file extension and will certainly lead many implementors to believe that .ttf fonts are not supported. The bulk of our fonts at Netflix are .ttf, and I suspect this is true of the industry. There should be a less confusing way to express the intent (if it is not actually to not support .ttf) or simply add this line font/tff with TrueType glyph data.

palemieux commented 5 years ago

@lcollinsnflx Which of the advanced text layout features (OTL, AAT, and SIL) are/should be supported? See https://www.iana.org/assignments/media-types/font/ttf

lcollinsnflx commented 5 years ago

OTL is the most widely implemented set of layout features in fonts. Same goes for rendering support in the various layout engines. So an implementation of this spec that wants to support layout, especially for scripts that require it for intelligible rendering, should support OTL at a minimum.

palemieux commented 5 years ago

@lcollinsnflx Thanks. Do you understand the difference between font/otf;outlines=CFF and font/ttf;layout=OTL?

lcollinsnflx commented 5 years ago

Yes, what's your point?

palemieux commented 5 years ago

@lcollinsnflx Oh. I meant: what is the difference? :)

lcollinsnflx commented 5 years ago

the former has opentype CFF glyph data and opentype layout and the latter has truetype glyphs with open type layout tables.

lcollinsnflx commented 5 years ago

It's still not clear where you are leading with this, especially since the spec mentions nothing about layout features.

palemieux commented 5 years ago

@lcollinsnflx Oh. I meant the difference between font/otf;outlines=TTF and font/ttf;layout=OTL... I mis-typed last night.

In other words, is font/otf;outlines=TTF equivalent to font/ttf;layout=OTL, or does font/otf;outlines=TTF include features beyond font/ttf;layout=OTL?

It's still not clear where you are leading with this, especially since the spec mentions nothing about layout features.

Simply trying to understand the differences between these various font formats.

lcollinsnflx commented 5 years ago

Well, for one, font/ttf;layout=OTL might also contain AAT, unless there is a different way to specify both. This has been true of many of Apple's fonts. Implementations that can handle AAT and OTL might prefer AAT due to performance characteristics. But, for the most part, font/otf;outlines=TTF and font/ttf;layout=OTL are identical. My original point was not that one could not understand that that the spec as written allows fonts with TrueType glyphs, but as written it suggests that you can only have them if the file extension is .otf. That would certainly exclude thousands of fonts that people might use for subtitles. If you don't add "font/ttf" then there should be more clarification of the actual intent regarding font file names.

palemieux commented 5 years ago

@lcollinsnflx Thanks for the detailed information, and your patience.

Well, for one, font/ttf;layout=OTL might also contain AAT, unless there is a different way to specify both.

Ok, so it sounds like font/ttf;layout=OTL must contain OTL layout but may also contain AAT layouts, which is ok if OTL support is required.

If you don't add "font/ttf" then there should be more clarification of the actual intent regarding font file names.

Yes, perhaps we could simply note that font/otf includes Font file extensions used for OFF / OpenType fonts: .ttf and .otf, as specified at https://www.iana.org/assignments/media-types/font/otf

lcollinsnflx commented 5 years ago

note that font/otf includes Font file extensions used for OFF / OpenType fonts: .ttf and .otf, as specified at https://www.iana.org/assignments/media-types/font/otf That would address my concern.

skynavga commented 5 years ago

Just to note, TTML2, which defines the underlying feature, i.e., font element, does not make any assumption whatsoever about names of resources or their extensions. That is, extension usage doesn't signify as far as TTML2 is concerned.

palemieux commented 5 years ago

@skynavga Yes... the concern, as I understand it, was that font/otf would imply that only .otf files were supported.