servo / unicode-bidi

Implementation of the Unicode Bidirection Algorithm in Rust
Other
74 stars 34 forks source link

Apply visual run resets to line range. #55

Closed laurmaedje closed 3 years ago

laurmaedje commented 3 years ago

The reset logic in visual runs doesn't take the line offset into account. The for loop iterates over the line's char indices, but the line-local index i is then used to index into the original_classes for the whole paragraph.

Consider the string "aa טֶ", which consists of five chars with the following subranges:

What happens in detail: When resolving the line 3..7, the code previously iterated over both chars, and for the last char i=2 resolved to the original class of the space, setting reset_from to Some(2). Then, all bytes from 2 to line_str.len() (which was 4), were reset to paragraph level. As a result, the first half of ט was reset to 0.

The result of all this is that the reordered visual runs now slice the ט in half because its level changes halfway through. In addition to a fix for the problem, I added the above example as a test case.

mbrubeck commented 3 years ago

Thanks!

It looks like this increases the minimum Rust requirement to 1.36 because it uses non-lexical lifetimes. I think that's fine, because our main downstream crate (url) already requires 1.36 or later. I'll merge this now, and open a separate PR to fix up the CI configuration.

mbrubeck commented 3 years ago

Published unicode-bidi 0.3.5 with this fix.