tc39 / proposal-intl-segmenter-v2

Version 2 of Intl Segmenter. Adding line break support.
https://tc39.github.io/proposal-intl-segmenter-v2/
MIT License
12 stars 4 forks source link

Integration with shaping is required #10

Open litherum opened 2 years ago

litherum commented 2 years ago

This is an elaboration of the kind of issue I describe in https://github.com/tc39/proposal-intl-segmenter-v2/issues/8:

Browsers’ text engines have a close interrelationship between font fallback, bidi processing, line breaking, and shaping, which cannot be matched using this API proposal.

Here's an example. Consider this Burmese string:

ဂျီးဒေါ်ကြီးကောင်ငင်

(I don't know Burmese, so I don't know whether this makes grammatical sense, but I'm using it as an example to describe a class of problems.)

It should be rendered like this:

Screen Shot 2022-01-27 at 3 57 51 PM

This string is the concatenation of two Burmese words: "ဂျီးဒေါ်" and "ကြီးကောင်ငင်". Burmese doesn't require spaces between words. Line breaks in Burmese are allowed between words, so there is a line breaking opportunity between "ဂျီးဒေါ်" and "ကြီးကောင်ငင်".

Imagine this text is being laid out using Noto Sans Myanmar within some available width. Paragraph layout code attempts to pick the line breaking opportunity which fills the available space as much as possible - i.e. the one which places as much text on the line without exceeding the available space.

With the current formulation of Intl.Segmenter v2, they would have 2 approaches to accomplishing this, both of which are wrong:

  1. They could partition the string into atomic non-breakable pieces, and measure each piece independently: "ဂျီးဒေါ်" and "ကြီးကောင်ငင်" would be measured independently. However, this is wrong because Noto Sans Myanmar has a kerning pair between the last letter of the first word and the first letter of the last word - between "ဒေါ်" and "ကြီး".

Here's an image displaying the width of "ဂျီးဒေါ်" alone:

Screen Shot 2022-01-27 at 3 51 30 PM

You can see how the character overhangs its available width. However, if the next character is "ကြီး", the kerning pair in Noto Sans Myanmar moves them apart, so "ဒေါ်" no longer overhangs.

So, if this method were employed, the two words would be measured independently, and the resulting line would be rendered like this, which is wrong:

Screen Shot 2022-01-27 at 3 55 14 PM
  1. Alternatively, instead of partitioning up the string into atomic pieces, the routine could measure the entire string from the beginning of the line up until the line breaking opportunity in question. It would do this for every line breaking opportunity, and stop once the available width is exhausted.

However, this is an O(n^2) algorithm. Imagine a line with n line breaking opportunities - this algorithm would require that you re-measure that first word n times. O(n^2) is not really acceptable for something as common as text rendering.

Browsers solve this problem by integrating line layout code with shaping code. One of the outputs of shaping code is information about where ligatures and kerning pairs exist - see HB_GLYPH_FLAG_UNSAFE_TO_BREAK. Line layout code uses this information to determine which parts of the layout can be re-used and which parts need to be re-measured when a line breaking candidate is changed.

This kind of shaping integration is not included in Intl.Segmenter v2.

yjbanov commented 2 years ago

The above is correct. But I'm not sure if it informs us about the suitability of the proposal for TC39. There are roughly 4 components in rendering a piece of text:

What is described in the original post is the work done by the shaping and line breaking components. There could be quite a bit of back'n'forth to fit a piece of text. However, there is, at most, a loose feedback loop with the segmentation component once the line breaking process begins (the only thing I can think of is "lazy segmentation" that can be used as a performance optimization for rendering long text).

In some applications the separation between segmentation and the other components can be temporal too. For example, as an optimization, the segmentation of a large piece of text could take place in a web worker. The text can be then shaped and rendered in the rendering isolate. In some cases, shaping and rendering are also disconnected temporally, for example, to prevent rendering more than is visible on the screen, even if the layout is computed eagerly to figure out what to render.

I think it would be more productive to think about these components as independent APIs that are composed into a solution, rather than all of them together being one solution. This is because the higher level you go the higher diversity of problems there are. The diversity shoots up particularly at the line breaking and rendering levels. The vast majority of apps could share the same segmentation and shaping. However, fitting text onto a space can be wildly different (1, 2, 3). We (the Flutter Web team) found that, while it is actually a fairly difficult problem, it's actually quite solvable in user space (in JavaScript or WebAssembly) and it does not significantly contribute to payload size (10s of KiB). Rendering is as diverse, if not more, due to several rendering technologies available - SVG, HTML, Canvas, WebGL, WebGPU - and of those WebGL and WebGPU offering unlimited customization opportunities. A special case of "rendering" worth noting is semantics, used by assistive technologies (e.g. VoiceOver, TalkBack, NVDA), but it can be viewed as a separate concern just like rasterization.

One use-case that could use segmentation completely stand-alone is education apps that teach students how to line-break text when hand-writing. Some of it could probably be satisfied by server-side segmentation, but when a server is not available would need client-side capabilities.

Conclusion

Given that segmentation only needs ICU data (which is already included in JavaScript as part of the other Intl functionality), and it being a separate concern from other aspects of text rendering, I do not yet see what makes it unsuitable for TC39.

However, I strongly agree that the proposal should actually help solving the larger problem. Flutter will not be able to adopt this proposal otherwise. To that end, we scheduled a PoC that's based on Intl.v8BreakIterator to see the results we get. Early data suggests that we can reduce the initial payload size of Flutter apps by ~0.5 MB. Currently, there's no known way to solve this in user space.

tabatkins commented 2 years ago

In some applications the separation between segmentation and the other components can be temporal too. For example, as an optimization, the segmentation of a large piece of text could take place in a web worker. The text can be then shaped and rendered in the rendering isolate.

Myles' point is that shaping and segmentation should not be disconnected in such a fashion, at least if you're doing segmentation for the purpose of figuring out how to lay out text on a line and want to do so performantly. Why wouldn't we want the shaping to also be done in the worker in this instance?

In some cases, shaping and rendering are also disconnected temporally, for example, to prevent rendering more than is visible on the screen, even if the layout is computed eagerly to figure out what to render.

Rendering is a completely separate stage that isn't relevant to the discussion here. You need to know dimensions and extents long before pixels ever come into play. It can indeed be something that happens with a significant temporal separation from shaping/linebreaking/etc.

yjbanov commented 2 years ago

Myles' point is that shaping and segmentation should not be disconnected in such a fashion, at least if you're doing segmentation for the purpose of figuring out how to lay out text on a line and want to do so performantly.

It's possible that I'm misunderstanding what @litherum wrote above. If so, please point me in the right direction. My understanding is that in string "ဂျီးဒေါ်ကြီးကောင်ငင်" the line breaking opportunity between "ဂျီးဒေါ်" and "ကြီးကောင်ငင်" is known purely by combining the string with a table in the ICU data. This information would be vended by Intl.Segmenter v2. Nothing that's computed at the shaping and line breaking level that would need to be fed back to the segmenter to refine that data. The ligatures, kerning and the final fitting into the shape of the paragraph are figured out without the segmenter's participation. @litherum also said "Browsers solve this problem by integrating line layout code with shaping code", which is an approach that makes sense to me. I'm not arguing against it. What I'm saying is that in all of this segmenter remains a cleanly separated component with a very specific role: turning text into segments.

Why wouldn't we want the shaping to also be done in the worker in this instance?

For example, to achieve paralellism. A book reading app for example, may need to process a lot of text to lay it out on pages.

Rendering is a completely separate stage that isn't relevant to the discussion here. You need to know dimensions and extents long before pixels ever come into play. It can indeed be something that happens with a significant temporal separation from shaping/linebreaking/etc.

I agree that it's a separate stage, at least in implementations that I've seen. I only brought it up because the question of completeness of the proposal has been brought up in #8. We want to make sure that a complete text rendering system can be built with the proposed API as a building block.

litherum commented 2 years ago

The question I was trying to pose was "How would you use Intl.Segmenter v2 to fill a line with arbitrary text, like this Burmese text, in a correct and performant way?"

My claim is that it's impossible, without doing something Herculean like embedding harfbuzz.js and running shaping in Javascript.

(An additional unstated claim is that any API that performs line breaking should be able to handle text like this Burmese text in a correct and performant way. And it should be able to do it without requiring a significant Javascript dependency.)

yjbanov commented 2 years ago

@litherum

My claim is that it's impossible, without doing something Herculean like embedding harfbuzz.js and running shaping in Javascript.

This is true, but only assuming we're starting from a vacuum. The reality is that HarfBuzz, Freetype, and other relevant libraries have been ported to the web and are in production use. Libraries such as CanvasKit are trivial to use and if you use Flutter you get all that stuff automatically. Figma, Adobe, AutoCAD chose to implement their own custom solutions. In the fullness of time, I expect there will be many specialized solutions, some for 2D graphics (a la CanvasKit), some for 3D graphics (a la PixiJS), some for rich document editing, etc. So the Herculean part is solved, or at least is on an incremental improvement trajectory (faster multi-threaded WASM with SIMD, WebGPU, OffscreenCanvas, will give them a boost).

I think a more useful question to ask instead is: "Does adding Segmenter v2 API alone have real-world benefits?" From a preliminary analysis the answer is "yes". For example, we have good reasons to believe that Flutter can remove ~0.5 MB of payload from all apps that target the Web. In Flutter, the text layout problem is 48% string segmentation (largely due to the size of ICU), 48% text shaping (HarfBuzz + Freetype), and 4% line breaking + rendering problem in terms for code size. Even though that 4% is a bit of a rocket science it results in small downloads after minification and compression, so the end user impact is relatively small.

An additional unstated claim is that any API that performs line breaking should be able to handle text like this Burmese text in a correct and performant way. And it should be able to do it without requiring a significant Javascript dependency.

This is the argument I'm paying a close attention to, although not because of a significant JavaScript dependency (as explained above, those dependencies already ship with production apps today). What would worry me is if the Segmenter v2 API could not be combined with other parts into a correct and performant solution. To that end, it would be interesting to see examples where the proposed API leads to incorrect results, or examples of algorithms with superior performance characteristics that the proposed API is incompatible with. We should also look for evidence that the proposal is actually useful. AFAICT, the example of Burmese text above is satisfied by the proposed API, but I'd be happier to see an actual PoC that demonstrates it.

Options

TC39 Stage 2

I'm new to TC39, but if I'm reading the process correctly, stage 1 is satisfied at this point. We have a champion with a proposal. Polyfills for this stuff exist already, both in user space (CanvasKit) and as a proprietary feature (Intl.v8BreakIterator). We've discussed cross-cutting concerns and now know what to look for (e.g. integration with shaping).

I think it would be reasonable to let the proposal to go to stage 2. It will allow the proposal to be speced out and have an experimental implementation to test against. At stage 2 we'll want to show that the adoption of the proposed API would lead to useful combinations of:

Move to W3C

While I do not yet see clear technical reasons why the proposal should not be part of TC39, moving it to W3C is not an unreasonable choice. This is because the second half of the problem - text shaping - is more appropriate for W3C than it is for TC39, as it needs fonts, styles, glyphs, a coordinate system, all of which is way beyond strings of text.

So the choice is between TC39 (segmenter) + W3C (shaping, and maybe rendering), and W3C (segmenter) + W3C (shaping). Even though TC39 + W3C might be cleaner, developing both halves under a single umbrella might be beneficial from the coordination and future maintenance standpoint.

litherum commented 2 years ago

Libraries such as CanvasKit are trivial to use

CanvasKit is 6.8MB + 133K = 6.97MB

Harfbuzz.js is 213K + 4.7K = 218KB

"Does adding Segmenter v2 API alone have real-world benefits?"

This is not the relevant question. The relevant question is "do the benefits outweigh the costs?"

If the costs are "claims to work for all languages, but cannot be used correctly with a bunch of languages, and it's not obvious which languages those are, and cannot be used correctly with bidirectional text, without being used with multiple Javascript dependencies, and it's not obvious which dependencies to use, or how to integrate them, and it would have to be maintained by browsers forever, in addition to whatever Houdini / canvas-formatted-text create, which could satisfy the same use cases" then the answer seems pretty clearly that the benefits do not outweigh the costs for most websites.

I agree with you that the proposal, as it stands, has nonzero value (for some web authors - for other web authors it probably actually would have negative value because of the confusion described above). I'm disagreeing that the proposal is good enough to be shipped as a standard.

litherum commented 2 years ago

I guess I also wanted to make the point that I'm not actively disagreeing with your use case. If/when an acceptable solution comes along, it ideally would be usable by Flutter.

yjbanov commented 2 years ago

@litherum

For payload sizes, consider using compressed numbers (gzip or brotli), as that's how modern websites are served. You will find that the size of that particular build of CanvasKit is ~2.5 MB. The numbers I used above are all gzip numbers.

By "trivial to use" I mean that it's easy for a developer to drop a library in their project and use it. It takes minutes to get going. There's no Herculean effort involved. The payload size doesn't change that.

The payload size is important for end user experience, though, and it's the whole reason the proposal exists. There are many things we can do to reduce that 2.5 MB, but one thing there's no user space solution for is the extra 0.5 MB chunk of ICU. Browser's help is needed.

Harfbuzz.js is interesting, and if it's capable of supporting all languages in a 91 KB gzipped wasm blob, then it could potentially solve the shaping problem without browser's assistance. We'll investigate. Thanks for the link :+1:

The relevant question is "do the benefits outweigh the costs?"

Agreed, but we should be mindful about who gets the benefits and who pays the costs. Today, users pay the cost (big downloads).

If the costs are [long list of costs]

That's quite a gloomy scenario! However, the TC39 process was designed to prevent such outcomes. The request is only to go to Stage 2. There will be Stage 3 that will include "feedback from implementations and users". The relevant cost to the browser vendor is the maintenance cost of the API. Let's not conflate that with other costs and benefits extemporaneously.

If/when an acceptable solution comes along, it ideally would be usable by Flutter.

I'm not sure what's suggested here. Design an API in the vacuum and hope that it solves real problems? If so, I recommend doing the reverse, identify problems apps face on the web today, then design an API that solves them.

Back to the original issue

The issue attempted to demonstrate interrelationship of the various components of text layout. It demonstrated the interactions between shaping and line breaking. No hard dependencies or feedback loops with text segmentation were identified.

I propose the following:

FrankYFTang commented 2 years ago
  1. They could partition the string into atomic non-breakable pieces, and measure each piece independently: "ဂျီးဒေါ်" and "ကြီးကောင်ငင်" would be measured independently. However, this is wrong because Noto Sans Myanmar has a kerning pair between the last letter of the first word and the first letter of the last word - between "ဒေါ်" and "ကြီး".

This will never happen, because that won't even work for the simplest English example "Hello World". If you have "Hello World". Intl.Segmenter will break them into "Hello " and "World", and even with monospace font, the text width of "Hello " is 5 and "World" is 5, but text width of "Hello World" is 11. So this is an algorithm no one will even attempt to implement because it won't even work for the simplest English example.

FrankYFTang commented 2 years ago

So, if this method were employed, the two words would be measured independently, and the resulting line would be rendered like this, which is wrong:

As I mentioned, if you follow the same algorithm to render "Hello World" in English, it will render as "HelloWorld" so no body will implement that anyway.

FrankYFTang commented 2 years ago

3. Alternatively, instead of partitioning up the string into atomic pieces, the routine could measure the entire string from the beginning of the line up until the line breaking opportunity in question. It would do this for every line breaking opportunity, and stop once the available width is exhausted.

However, this is an O(n^2) algorithm. Imagine a line with n line breaking opportunities - this algorithm would require that you re-measure that first word n times. O(n^2) is not really acceptable for something as common as text rendering.

Binary search will lead to O(n log n) , right?

FrankYFTang commented 2 years ago

and if the width of the region is short, the n in n(n+1)/2 (which lead to O(n^2)) is limited by the region and will be small and even a linear search of all line break opportunities in it won't be very slow since the n is small. If the width of the rigion is wider, the caller could first use the width of the region / the font size and linear approximation to "guess" a point in the text, and use then Intl.Segmenter to find out the near by opportunities and then use binary search to narrow down to the exact line break point of that ine, that will be less than n(n+1)/2 .