servo / unicode-bidi

Implementation of the Unicode Bidirection Algorithm in Rust
Other
77 stars 33 forks source link

Rules W1–W6 incorrectly applied in a single pass #8

Closed mbrubeck closed 1 year ago

mbrubeck commented 9 years ago

As reported in #7: implicit::resolve_weak applies steps W1-W7 in a single pass. This can produce incorrect results in cases where a "later" rule changes the value of prev_class seen by an "earlier" rule. We should either split this into separate passes, or preserve extra state so each rule sees the correct previous class.

behnam commented 7 years ago

Is this still an issue, or was fixed in #15? If we don't have clear test case for the problem, we can just drop it in favor of the conformance tests.

mbrubeck commented 7 years ago

Rule W7 was fixed, but rules W1–W6 are still incorrect.

behnam commented 7 years ago

I did a quick check and test and turns out this account for about 200 of the failures in the conformance test. I'll work on a clean patch soon.

behnam commented 7 years ago

After https://github.com/servo/unicode-bidi/pull/36, there are 314 failures remaining for the conformance tests.

Looks like 124 of them are the result of not performing W1-W6 steps in separate passes (or make it feel like that), and the majority of the other 190 is because of the these rules being applied to byte indices and not character indices.

So, this part needs to be re-written to address both these cases, and that should get us to 100% conformance.