linebender / skribo

A Rust library for low-level text layout.
Apache License 2.0
324 stars 35 forks source link

Add script runs and layout session #11

Closed raphlinus closed 5 years ago

raphlinus commented 5 years ago

This commit analyzes script runs based on Unicode data. It also starts using a "LayoutSession" type, which will eventually support queries of substrings.

Access to the layout is done through iterators, which will improve flexibility. These iterators don't allocate. In addition, the iterator now breaks out a run of glyphs of the same font, which will very likely improve performance downstream.

It's still work in progress, but I wanted to checkpoint.

Progress towards #4

raphlinus commented 5 years ago

@SimonSapin Would you mind taking a look at this API and giving me a sense of whether it will work for you?

The current implementation doesn't actually check the unsafe_to_break flags, but could be adapted to do so without much trouble. It also doesn't send the context (for substrings that break Arabic words, for example), but again could be changed to do that without much trouble.

Obviously we need more queries, mostly width and horizontal <-> offset conversions, but I want to get a handle on rendering first.

SimonSapin commented 5 years ago

Do I understand correctly that to layout multiple lines, you’d call LayoutSession::iter_substr for each line with a range of UTF-8 bytes in the Unicode string?

Yes, the width query would be necessary to decide which break opportunities to take. For greedy line breaking I imagine one could successively query increasingly larger ranges until one doesn’t fit the available width. To avoid this taking O(n²) time, can the LayoutSession object “cache” some of the results from previous queries?

I see that LayoutSession borrows a &'a str. Is it intended to always be short-lived? I imagine this borrow could be a problem if we want to preserve the intermediate computation results inside LayoutSession longer than the stack frame that created it. Should it instead take S: AsRef<str> as input?

raphlinus commented 5 years ago

I didn't do the implementation of it but in the common case I expect the iter_substr queries to be very fast because most breaks should be safe-to-break by the shaper. Absolutely the LayoutSession can do as much caching as it likes, this is part of the point. I can see how to make the width queries for successively longer lines take O(n log n); you do a binary search at each endpoint and subtract the cumulative advances.

If the binary search ends up being nontrivial, it might be possible to avoid it by doing an abstraction like the cursor in xi-rope.

Also, yes, &'a str is probably wrong here, I was trying to be clever and not allocate a string, but I see this really messes up longer-lived sessions, for the same reasons that motivate rental. Maybe the best thing is to have a Cow, so it would support stack-lived queries and you'd just do owned for longer-lived ones. In any case, the cost of the string copy is probably insignificant compared to the other work, so it might be a good idea to simplify it and just go with String.

raphlinus commented 5 years ago

We settled on Cow in discussion, but what about this StrSource trait? It can also be backed by xi-rope. I'm not sure it's worth the extra complexity though, ie if it's not implemented by anything other than the usual string types, Cow is just as good.

SimonSapin commented 5 years ago

Cow<str> or impl AsRef<str> or impl StrSource all fix the mandatory borrow. Since you’re not sure the complexity is worth it I’d go with AsRef for now. It can always be replaced by StrSource with impl<S> StrSource for S where S: AsRef<str> later if that turns out useful.

raphlinus commented 5 years ago

Ah, I'll go with AsRef. For some reason, it didn't penetrate my thick skull this solves all the problems as long as you don't mind a contiguous string, which I think is fine.