w3c / imsc-hrm

IMSC Hypothetical Render Model
https://w3c.github.io/imsc-hrm/spec/imsc-hrm.html
Other
1 stars 6 forks source link

'script' property may not be properly specified, may be incomplete #39

Closed aphillips closed 2 years ago

aphillips commented 2 years ago

Paint Text https://w3c.github.io/imsc-hrm/spec/imsc-hrm.html#paint-text

Script property (see Standard Annex #24 at [UNICODE]) for the character of gi

latin, greek, cyrillic, hebrew or base

The name 'base' in the table assigning Normalized glyph copy performance factor (GCpy) does not correspond to script properties in Unicode. It seems like Common is the value desired here. FWIW, the other script names should probably be rendered the way Unicode does, in title case (e.g. Latin, Greek, etc.) or perhaps using the ISO15924 codes (Latn, Grek, Cyrl, Hebr).

It is unclear why these specific scripts are assigned a "performance factor" 4x other characters. Many scripts have similar structure to these and might also want a higher performance factor. Can you explain why this partial list is used?

nigelmegitt commented 2 years ago

This issue is very closely related, both in subject matter and in section of the specification, to #38 - I would suggest that we consider both issues together when proposing any pull requests.

I don't know the answer to your question @aphillips but hope we can trawl our collective memories and archives and provide either an explanation or a fix!

palemieux commented 2 years ago

@aphillips The GCpy parameter is intended to capture the relative speed of re-rendering a previously rendered glyph -- perhaps by copying it from an image buffer. The larger the GCpy value the faster it is to re-render a previously rendered glyph.

GCpy is larger than Ren(Gi) because the HRM assumes a renderer will cache recently rendered glyphs, making them faster to render.

When the HRM was defined in 2013, device manufacturers, which included home theatre devices such as TVs, indicated that glyphs associated with latin, greek, cyrillic, hebrew or base scripts were faster to re-render once rendered once.

aphillips commented 2 years ago

@palemieux Thanks! That's very helpful.

This is unsurprising, since those scripts are for languages that use (relatively) small character sets with generally-speaking a lack of contextual glyph variation. Large character set languages or those that use combining marks or contextual shaping don't benefit as much (or at all) from the glyph cache by comparison. Note that the performance boost also is influenced by the fact that most text being rendered in captions is in the same language--while a script like Latin is actually quite extensive, that's partially because it embraces characters that support specific languages. The same language will use the same subset.

My comment can be broken up into a couple of parts:

palemieux commented 2 years ago

I'd suggest that you match Unicode's terminology for scripts so that implementers don't have to guess what "base" means (I'm fairly certain it means the Common script).

Yes. It looks like this is a bug that was introduced during a search-replace operation:

https://github.com/w3c/imsc/pull/354/commits/b7c99f35390886c5d46816c190e7a29329710ce9

I would be fine with your sticking to observed performance (i.e. the current list), although in that case you might want to have a note somewhere that calls out that other languages/script may perform "better than expected".

+1

The note could also state that additional scripts can be added when concrete use cases are brought up.