open-i18n / rust-unic

UNIC: Unicode and Internationalization Crates for Rust
https://crates.io/crates/unic
Other
234 stars 24 forks source link

[unic-bidi] Bugs in visual_runs #272

Open raphlinus opened 3 years ago

raphlinus commented 3 years ago

I was looking at the code of unic-bidi, considering using it for skribo-layout and trying to understand it better, and came across what I am pretty sure are some logic errors.

fn main() {
    let text = "a אב ג";
    println!("{:x?}", text.as_bytes());
    let bidi = unic_bidi::BidiInfo::new(text, None);
    println!("{:?}", bidi);
    let para = &bidi.paragraphs[0];
    println!("{:?}", bidi.visual_runs(para, 0..9));
    println!("{:?}", bidi.visual_runs(para, 0..7));
    println!("{:?}", bidi.visual_runs(para, 2..7));
}

The first behaves as expected, the runs are [0..2, 2..9].

In the second, I expect the whitespace at the end of line to be ordered after the RTL run, i.e. I expect the runs to be [0..2, 2..6, 6..7], but I get [0..2, 6..7, 2..6]. I see the L1 logic (resetting the EOL whitespace chars to paragraph level) in the returned "levels" array, but the reversing logic (line 366 and 374) is referring to self.levels, not the locally computed levels. Patching this locally seems to resolve the issue.

In the third, I expect [2..6, 6..7] for similar reasons, but this time I get [2..7]. What seems to be going on here is that the i variable in the char_indices() enumeration (line 297) is an offset relative to the line start, but both the original_classes[] query and the levels[] mutation are applied relative to text start.

I have some other concerns, but will start with these. Happy to do a PR, as the fixes seem simple.

mbrubeck commented 3 years ago

This is fixed by #278. The fix is already applied to the original unicode-bidi crate in https://github.com/servo/unicode-bidi/pull/55.