linebender / skribo

A Rust library for low-level text layout.
Apache License 2.0
326 stars 36 forks source link

Add draft of requirements doc #2

Closed raphlinus closed 5 years ago

raphlinus commented 5 years ago

This is a first draft of a requirements doc, with a sketch of the major types, and references to similar systems. The main reason I'm posting it is to gather feedback.

raphlinus commented 5 years ago

Rendered

SimonSapin commented 5 years ago

What would be typical or expected sources for locale information?

What is relevant for rendering a piece of text is the language that piece of text is written in. But for applications like a web browser or an email client that show “remote” content, that could well be unrelated to the locale used by the application itself.

Although byte encoding of Unicode code points is not font selection, there is precedent for trying to guess the locale of content based on the locale of the user. However it is a last resort only implemented reluctantly because too much untagged legacy content breaks otherwise. Ideally a given web page would render the same everywhere. I don’t know to what extent similar legacy constraints apply to font selection.

raphlinus commented 5 years ago

So, getting the locale information is, I believe, beyond the scope of the skribo crate, and I should probably clarify that in the requirements doc.

A web browser is going to apply a pile of heuristics to get at this. A seemingly authoritative source is the the "lang" attribute in HTML. However, I don't know if we have good empirical data on how often it's used, and how accurate it is when it is specified.

Other sources of language information include textual analysis, inferring locale from encoding (Shift-JIS is an excellent indicator for ja_JP, even though it's legacy by now), from font specification ("Meiryo" is similarly a ja_JP indicator). Then, as likely the last fallback, the user's locale. Note that it's similar to guessing encoding but different, as failure is generally more subtle than the mojibake that results from incorrect encoding.

mikeday commented 5 years ago

I'm curious about font selection and font fallback, as these are tricky and have varying requirements, for example Prince users care a lot about characters with missing glyphs, so we need to log this.

I'm also wondering if there are any differences in requirements between web user agents and other callers such as GUI libraries or games; perhaps only whether the pixel sizes are significant or not?

Other random thoughts: text-decoration? Arabic kashidas? other justification related issues such as glyph scaling and optical margin alignment?

mikeday commented 5 years ago

Oh and also there is the question of who has responsibility for synthesising artificial bold, italic, and small-caps fonts, the last being interesting in the way it requires case-conversion and a flag indicating the size change.

mikeday commented 5 years ago

Will platform font stacks be used for anything other than accessing system fonts?

SimonSapin commented 5 years ago

Indeed, WeasyPrint also semi-regularly gets questions or bug reports from someone running it on a server or container without any font installed. A case I haven’t heard of occurring but is likely less obvious (and so logging it might be more important) is only some glyphs being missing in the middle of a document that otherwise looks fine.

I suspect we want to make it possible (but of course not mandatory) to use the platform’s glyph rasterization together with Skribo. But that wouldn’t be part of Skribo itself.

raphlinus commented 5 years ago

Great comments here, let me try to address some of them, then I'll probably update the doc this afternoon.

Again I'll check in with Servo, but I strongly suspect that the story with kashida (and other advanced layout features) is that the design should accommodate them, but it's very unlikely that either Servo or other clients will be able to make use of them any time soon. I'm also very interested in the potential for variable fonts to improve justification, both in Arabic and other scripts.

Optical margin alignment is potentially in scope, as hanging-punctuation is in CSS draft (I'm not sure where this is in the standardization process, maybe someone can fill me in). I'll add that as a requirement, though again it's very unlikely it would be included in a first round of implementation.

Text-decoration is an interesting question, because it could be considered in scope for the higher level, all the lower level needs to do is provide accurate metrics data for the higher level to draw the decorations. But if it turns out there's value in including it, sure.

Generally GUI and Web use cases have a lot of overlap. I think a lot of the differences are respecting specific CSS semantics, especially around font matching. One thing that seems very specific to Web is unicode-range, which I don't see natively supported in the "font collection builder" of any other text stack. It's possible unicode-range will be considered legacy soon, as a binary diff mechanism being developed in the W3C may supplant it. But even so, it will be essential to maintain compatibility for a while.

Will platform font stacks be used for anything other than accessing system fonts?

I didn't emphasize it in this writeup (more so in the blog post), but my current thinking is that yes, as an optional feature the platform will do shaping as well, instead of HarfBuzz. That won't be used by Servo, but is interesting, I think, for GUI. On the other hand, there's potentially more thought that's needed on exactly where the abstraction boundaries are. You can make a case that skribo always does shaping, but a higher level text API can feature-select between a platform font stack and skribo.

Re writing modes. I get the feeling that supporting them at the skribo level won't be terribly difficult, but they create significant complexity at higher levels. I think at the least we should design the interfaces to accommodate them, even if the implementation is stubbed out at first, because it's hard to retrofit something that bakes in assumptions about horizontal direction.

Fake italic and fake bold are definitely in scope, and I'll add them. This is a very common problem, different fonts in the stack having different weights available, similarly italic. One good question is whether fake bold can do arbitrary weight adjustment, or just 400->700. To properly support this may require more work on the renderer side, but the role of skribo is to provide weight adjustment data per run.

raphlinus commented 5 years ago

I've added a bunch more text, hopefully taking the comments into account. If I missed anything, let me know, otherwise I'll merge this. It doesn't set the requirements in stone, we can always add issues for more, but it'd be nice to have something solid to work from.