w3c / css-houdini-drafts

Mirror of https://hg.css-houdini.org/drafts
https://drafts.css-houdini.org/
Other
1.84k stars 141 forks source link

[font-metrics-api] Compatibility notes with Canvas TextMetrics API #832

Open fserb opened 5 years ago

fserb commented 5 years ago

Following up with some questions from: https://github.com/w3c/css-houdini-drafts/issues/828

It seems that a Houdini TextMetrics API has to figure out proper use cases to track. Meanwhile, we'd like to move forward on WHATWG over the Canvas update of the TextMetrics API. And I'd really appreciate some input from folks here about some logical issues with a Canvas TextMetrics API.

The use cases for canvas are mostly caret position and character highlight.

  1. But I've been told that it's not possible to assign a 1:1 map between characters and caret position (due to bidi, in which case sometimes 2 different caret position can point to the same character). If that's so, would it still be possible to properly define a 1:1 map with: if LTR, the left-most caret position associated with this character? Would that stop being useful?

  2. If we want to assign, for a grapheme cluster a left-right bounding box (to be able to draw a background on a single character). It seems to me that it's always possible to map, for each character, a left-side (not start) position and an advance. This seems to be true to me even in bidi.

For Canvas, there's very little layout requirements (no vertical text, no multi-line for example). So we could think of it as a very simplified API, as long as it's not egregiously wrong from a layout perspective. Would something like this be possible?

jfkthame commented 5 years ago

If we want to assign, for a grapheme cluster a left-right bounding box (to be able to draw a background on a single character)

Careful with the terminology here... is "character" synonymous with "grapheme cluster" in that sentence?

From a JS point of view, as a user of a Canvas TextMetrics API, I would assume "character" really means "UTF-16 code unit", because that's what JS strings use (and expose). But if so, it's not necessarily possible to "draw a background on a single character", which might be only half of a surrogate pair.

So maybe you meant "Unicode codepoint"? That's more logical in terms of acting on "a single character"... but still problematic. A single Unicode character may be rendered as multiple disjoint glyphs, so that drawing a background on that single character is actually much harder than just painting a rectangle. Consider the two-character sequence <U+0D15, U+0D4A> (spelling a simple Malayalam consonant-vowel syllable). This renders as three glyphs, കൊ, where the vowel U+0D4A (ൊ) is split into two parts, one to the left of the consonant U+0D15 (ക) and one to the right of it. What "left-side position and advance" does the character U+0D4A have?

OK, so maybe you really meant "grapheme cluster" everywhere, and never some other lower-level version of "character"? Then <U+0D15, U+0D4A> is a single grapheme cluster, it has a single left-side and advance, and you could reasonably use these to paint a background for that single cluster (not character). But is that really what people want? The Malayalam syllable കൊ is made up of two characters, by any common understanding of "character", and to be unable to address those characters as separate units seems wrong.

Text is hard!