Open kojiishi opened 5 years ago
If this is intended to provide caret positions, we should explicitly say so. This is because iOS and macOS explicitly have “caret position” segmenters as distinct from “character” segmenters. https://trac.webkit.org/browser/webkit/trunk/Source/WTF/wtf/spi/cf/CFStringSPI.h#L41
Since the clusters will be in visual order, we should determine if It’s in the direction of the base direction or if it’s always LTR. (Internally, WebKit always uses LTR, and if the base direction is RTL we do some processing to flip it around so our internal visual-order data structures are always LTR.) I’m not arguing for one or the other; just that we need to specify which way it is.
UA may produce one TextMetricUnit for a ligature
If it’s used for caret positions, this is probably wrong.
How is this API associated with a font and string?
One feedback I got offline: for RTL, position
and advance
can be:
position
is line-left side, and advance
is positive. Sum of them gives the line-right side.position
is inline-start side, and advance
is negative. Sum of them gives the inline-end side.
Other combinations are also possible but probably one of these two are reasonable.At first I thought codeUnitIndexes
should be a plural and an array (or a codeUnitLength
property) and that UAs should not synthetize the individual code points because it's not possible for the API consumer to know that it is not safe to reuse. Though now I think about it, the same holds true if kerning
is factored in, so you probably can't reuse anyway. Thoughts?
Also, I think this API needs to return first a runs
array consisting of arrays of text runs of same font and directionality, otherwise how do you represent the position/advance of a glyph that is preceded by glyphs of another directionality?
Okay, given we switched to visual order, my previous comment about runs
doesn't matter anymore. We also agreed to represent lengths for the codeUnitIndex values, so I think that was addressed today as well.
There is one more thing though, it's not clear when font fallback happens inside the run, and ideally it would be nice to know that.
The Houdini Task Force just discussed FontMetrics
.
Last but not least, it's not clear when font fallback happens inside the run, and ideally it would be nice to know that.
Presumably this should match the UA.
Perhaps one way to spec this is “pretend you make an iframe with this string inside it, return data as-if you did that”
Perhaps one way to spec this is “pretend you make an iframe with this string inside it, return data as-if you did that”
I like that suggestion and agree that it's important that it matches the UA fallback handling.
I think this API needs to return first a runs array consisting of arrays of text runs of same font and directionality
This sounds like a potential fingerprinting vector, making it easier for a page to probe details of the machine's font configuration.
@FremyCompany:
...safe to reuse. Though now I think about it, the same holds true if kerning is factored in, so you probably can't reuse anyway. Thoughts?
Yeah, reuse is not possible by kerning, joining, and all such shaping effects. In Blink internally we cached metrics for each space-delimited word, but had problem in kerning between space character and letters. To determine the correct reusability is not an easy task, for this API, we assume authors call the API for all their string without considering reuse.
I think this API needs to return first a runs array consisting of arrays of text runs of same font and directionality
The idea to return runs was raised by other people too, and I think it's nice and clean. But figuring out how to segment runs isn't easy. The directionality and fonts are good ones, one may want to split at script boundary, and more. The current proposal tries to avoid that discussion by returning a flat array with all such properties exposed (fonts are not exposed yet but we probably want to add in future), so that authors can build runs if needed.
I think this API needs to return first a runs array consisting of arrays of text runs of same font and directionality
This sounds like a potential fingerprinting vector, making it easier for a page to probe details of the machine's font configuration.
It’s already discoverable in JavaScript by creating <span>
s with different contents and styling.
We solved this in WebKit by ignoring all user-installed fonts, making everyone* appear to have the same set of fonts installed.
It’s also worth linking to https://github.com/tc39/proposal-intl-segmenter. This proposal allows web developers to do their own line breaking.
* for some definition of “everyone”
During the F2F, we stated how the intended use case for this API is drawing a background behind a particular word in a line of text. However, if a developer wants to draw a background behind a word, he/she can just make 2 calls to measureText() to get the width of the entire string before the word, and the width of the word itself. There usually (always?) isn't complex shaping across word boundaries that would make this approach incorrect. You could also use this API to draw a background around an individual character inside a word, but it's hard to imagine that developers are clamoring to do that.
I can't think of any other use cases that would be satisfied by this proposal. You can't draw a blinking caret, since this API measures grapheme clusters, not caret positions. You can't paint individual glyphs at specific positions. The call doesn't return information about font fallback or about how the UBA rearranged your string.
I also don't think that it would be a good idea for the Web Platform to move in the direction of exposing tons of typographic information. The best way to perform text layout is to use HTML elements and CSS. An author trying to do it themself with Javascript would almost certainly be both slower, less correct, and less accessible than doing it with the browser's engine.
So, I'm sympathetic to solving a specific use case that developers are asking for, but I'm less sympathetic about this particular use case (because it can already be solved with existing APIs), and I'm even less sympathetic about the general direction of supporting developers implement their own paragraph layout in script.
I've added some questions related to a Canvas compatibility API here: https://github.com/w3c/css-houdini-drafts/issues/832
One of the feedback to the previous proposal was about use cases. This proposal tries to solve use cases such as knowing caret positions of characters, or drawing background, text decoration effects, or selection to the text.
This proposal adds
characters
as an array of metrics for each grapheme cluster in the logical order. This change helps not to expose the details of shaping, which is sometimes complicated and may vary by platforms, as indicated by feedback.The
codeUnitIndex
attribute provides the index of the first code unit of the base character.When there are ligatures of multiple grapheme clusters, UA may produce one
TextMetricUnit
for a ligature, or compute metrics for each grapheme cluster in the ligature by using the information in the font, or by synthesizing.UA may tailor UAX#29 if needed for the caret positioning purpose.
This interface supports RTL, by adding the
isRightToLeft
attribute which represents the resolved direction of the grapheme cluster, and by adding theposition
attribute which represents the start-side position of the grapheme cluster from the origin. Withposition
andadvance
, this interface can represent the ambiguous caret positioning at BiDi boundaries. Theposition
attribute may also help rearrangement of glyphs while shaping some scripts if it occurs across grapheme cluster boundaries.I think this new proposal covers feedback at whatwg/html#4026 and whatwg/html#3994, applies to both the Font Metrics API and to Canvas Text Metrics API.
Appreciate feedback in advance: @litherum @FremyCompany @dbaron @jfkthame @r12a @annevk @fserb @domenic @eaenet @chrishtr @drott (Can't seem to mention @whatwg/canvas, anyone know how?)