servo / unicode-bidi

Implementation of the Unicode Bidirection Algorithm in Rust
Other
78 stars 33 forks source link

Analysis of failing character tests (after #85) #90

Closed Manishearth closed 1 year ago

Manishearth commented 1 year ago

This is where I'm tracking all of the failures left in the character tests after #85 and #91. This is not checking the 314 failures (250 after #87) for the basic tests.

I'm categorizing them by their section in BidiConformanceTest.txt, and filling in issue numbers as necessary. Investigations on the ??s would be appreciated!

Explicit directional overrides applied to paired brackets (https://github.com/servo/unicode-bidi/issues/89)

8 tests ``` 202A 05D0 0028 05D1 202C 202D 0029;2;1;x 3 3 3 x x 2;3 2 1 6 202A 05D0 0028 05D1 202C 202D 0029 202C;2;1;x 3 3 3 x x 2 x;3 2 1 6 202B 0061 0028 0062 202C 202E 0029;2;0;x 2 2 2 x x 1;6 1 2 3 202B 0061 0028 0062 202C 202E 0029 202C;2;0;x 2 2 2 x x 1 x;6 1 2 3 202D 0028 202C 202A 05D0 0029 05D1;2;1;x 2 x x 3 3 3;1 6 5 4 202D 0028 202C 202A 05D0 0029 05D1 202C;2;1;x 2 x x 3 3 3 x;1 6 5 4 202E 0028 202C 202B 0061 0029 0062;2;0;x 1 x x 2 2 2;4 5 6 1 202E 0028 202C 202B 0061 0029 0062 202C;2;0;x 1 x x 2 2 2 x;4 5 6 1 ```

Nonspacing marks applied to paired brackets. These cases exercise the ignoring of bc=BN characters (#89, probably)

4 tests ``` 0041 200F 005B 05D0 005D 200D 20D6;0;0;0 1 1 1 1 x 1;0 6 4 3 2 1 0041 200F 005B 200D 20D6 05D0 005D;0;0;0 1 1 x 1 1 1;0 6 5 4 2 1 0041 200F 005B 200D 20D6 05D0 005D 200D 20D6;0;0;0 1 1 x 1 1 1 x 1;0 8 6 5 4 2 1 0041 200F 005B 200D 200B 20D6 05D0 005D 200B 200D 20D6;0;0;0 1 1 x x 1 1 1 x x 1;0 10 7 6 5 2 1 ```

Sequences containing directional formatting characters (https://github.com/servo/unicode-bidi/issues/89)

6 tests ``` 0061 202D 202C 0020 0031 0020 0032 002D 0033;1;1;2 x x 2 2 2 2 2 2;0 3 4 5 6 7 8 0061 202E 202C 0020 0031 0020 0032 002D 0033;1;1;2 x x 2 2 2 2 2 2;0 3 4 5 6 7 8 0627 202A 202C 0020 0031 002D 0032;0;0;1 x x 1 2 1 2;6 5 4 3 0 0627 202B 202C 0020 0031 002D 0032;0;0;1 x x 1 2 1 2;6 5 4 3 0 05D0 202A 202A 202C 202C 0020 0031 0020 0032;0;0;1 x x x x 1 2 1 2;8 7 6 5 0 0061 202B 202B 202C 202C 0020 0031 0020 0032;1;1;2 x x x x 2 2 2 2;0 5 6 7 8 ```

Combinations of paired brackets, numbers, and directional formatting characters (probably involves some of https://github.com/servo/unicode-bidi/issues/89)

11 tests ``` 2066 0029 0029 0661 0028 0627 0029;1;1;1 2 2 4 3 3 3;1 2 6 5 4 3 0 2066 0029 0029 0661 0028 0662 0029;1;1;1 2 2 4 3 4 3;1 2 6 5 4 3 0 2066 0029 2066 0661 0028 05D0 0029;1;1;1 2 2 6 5 5 5;1 2 6 5 4 3 0 0061 0028 0062 005B 0063 2068 05D0 2069 0064 005D 0065 0029 0066;1;1;2 2 2 2 2 2 3 2 2 2 2 2 2;0 1 2 3 4 5 6 7 8 9 10 11 12 05D0 0028 05D1 005B 05D2 2068 0061 2069 05D3 005D 05D4 0029 05D5;0;0;1 1 1 1 1 1 2 1 1 1 1 1 1;12 11 10 9 8 7 6 5 4 3 2 1 0 0061 0028 0062 2067 05D0 005B 05D1 2066 0063 05D3 2069 0065 005D 0066 2069 05D4 0029 05D5;0;0;0 0 0 0 1 1 1 1 2 3 1 2 1 2 0 1 0 1;0 1 2 3 13 12 11 10 8 9 7 6 5 4 14 15 16 17 0061 0028 0062 2067 05D0 005B 05D1 2066 0063 007B 0064 202B 007D 0020 007B 202C 05D2 007D 05D3 2069 0065 005D 0066 2069 05D4 0029 05D5;0;0;0 0 0 0 1 1 1 1 2 2 2 x 3 3 3 x 3 3 3 1 2 1 2 0 1 0 1;0 1 2 3 22 21 20 19 8 9 10 18 17 16 14 13 12 7 6 5 4 23 24 25 26 0061 0028 0062 2067 05D0 005B 05D1 2066 0063 007B 0064 202B 007D 0020 007B 202C 05D2 007D 05D3 2069 0065 005D 0066 2069 05D4 0029 05D5;1;1;2 1 2 1 3 3 3 3 4 4 4 x 5 5 5 x 5 5 5 3 4 3 4 1 1 1 1;26 25 24 23 22 21 20 19 8 9 10 18 17 16 14 13 12 7 6 5 4 3 2 1 0 05D0 0028 05D1 2066 0061 005B 0062 2067 05D2 0064 2069 05D4 005D 05D5 2069 0065 0029 0066;1;1;1 1 1 1 2 2 2 2 3 4 2 3 2 3 1 2 1 2;17 16 15 14 4 5 6 7 9 8 10 11 12 13 3 2 1 0 05D0 0028 05D1 2066 0061 005B 0062 2067 05D2 007B 05D3 202A 007D 0020 007B 202C 0063 007D 0064 2069 05D4 005D 05D5 2069 0065 0029 0066;0;0;1 0 1 0 2 2 2 2 3 3 3 x 4 4 4 x 4 4 4 2 3 2 3 0 0 0 0;0 1 2 3 4 5 6 7 12 13 14 16 17 18 10 9 8 19 20 21 22 23 24 25 26 05D0 0028 05D1 2066 0061 005B 0062 2067 05D2 007B 05D3 202A 007D 0020 007B 202C 0063 007D 0064 2069 05D4 005D 05D5 2069 0065 0029 0066;1;1;1 1 1 1 2 2 2 2 3 3 3 x 4 4 4 x 4 4 4 2 3 2 3 1 2 1 2;26 25 24 23 4 5 6 7 12 13 14 16 17 18 10 9 8 19 20 21 22 3 2 1 0 ```
Manishearth commented 1 year ago

Also for people debugging these, https://util.unicode.org/UnicodeJsps/bidic.jsp?s=%D7%90%281%29&b=0&u=140&d=2 is amazing

Manishearth commented 1 year ago

Between #85 and #91, I think I've knocked out all of the failures that are not due to #89 (or will be hard to debug without #89). I might want to wait for #85 to merge before doing #89 since it's got involvement with everything.

Manishearth commented 1 year ago

After https://github.com/servo/unicode-bidi/pull/92, so far we have these failures:

Explicit directional overrides applied to paired brackets

8 tests ``` 202A 05D0 0028 05D1 202C 202D 0029;2;1;x 3 3 3 x x 2;3 2 1 6 202A 05D0 0028 05D1 202C 202D 0029 202C;2;1;x 3 3 3 x x 2 x;3 2 1 6 202B 0061 0028 0062 202C 202E 0029;2;0;x 2 2 2 x x 1;6 1 2 3 202B 0061 0028 0062 202C 202E 0029 202C;2;0;x 2 2 2 x x 1 x;6 1 2 3 202D 0028 202C 202A 05D0 0029 05D1;2;1;x 2 x x 3 3 3;1 6 5 4 202D 0028 202C 202A 05D0 0029 05D1 202C;2;1;x 2 x x 3 3 3 x;1 6 5 4 202E 0028 202C 202B 0061 0029 0062;2;0;x 1 x x 2 2 2;4 5 6 1 202E 0028 202C 202B 0061 0029 0062 202C;2;0;x 1 x x 2 2 2 x;4 5 6 1 ```

Combinations of paired brackets, numbers, and directional formatting characters

11 tests ``` 2066 0029 0029 0661 0028 0627 0029;1;1;1 2 2 4 3 3 3;1 2 6 5 4 3 0 2066 0029 0029 0661 0028 0662 0029;1;1;1 2 2 4 3 4 3;1 2 6 5 4 3 0 2066 0029 2066 0661 0028 05D0 0029;1;1;1 2 2 6 5 5 5;1 2 6 5 4 3 0 0061 0028 0062 005B 0063 2068 05D0 2069 0064 005D 0065 0029 0066;1;1;2 2 2 2 2 2 3 2 2 2 2 2 2;0 1 2 3 4 5 6 7 8 9 10 11 12 05D0 0028 05D1 005B 05D2 2068 0061 2069 05D3 005D 05D4 0029 05D5;0;0;1 1 1 1 1 1 2 1 1 1 1 1 1;12 11 10 9 8 7 6 5 4 3 2 1 0 0061 0028 0062 2067 05D0 005B 05D1 2066 0063 05D3 2069 0065 005D 0066 2069 05D4 0029 05D5;0;0;0 0 0 0 1 1 1 1 2 3 1 2 1 2 0 1 0 1;0 1 2 3 13 12 11 10 8 9 7 6 5 4 14 15 16 17 0061 0028 0062 2067 05D0 005B 05D1 2066 0063 007B 0064 202B 007D 0020 007B 202C 05D2 007D 05D3 2069 0065 005D 0066 2069 05D4 0029 05D5;0;0;0 0 0 0 1 1 1 1 2 2 2 x 3 3 3 x 3 3 3 1 2 1 2 0 1 0 1;0 1 2 3 22 21 20 19 8 9 10 18 17 16 14 13 12 7 6 5 4 23 24 25 26 0061 0028 0062 2067 05D0 005B 05D1 2066 0063 007B 0064 202B 007D 0020 007B 202C 05D2 007D 05D3 2069 0065 005D 0066 2069 05D4 0029 05D5;1;1;2 1 2 1 3 3 3 3 4 4 4 x 5 5 5 x 5 5 5 3 4 3 4 1 1 1 1;26 25 24 23 22 21 20 19 8 9 10 18 17 16 14 13 12 7 6 5 4 3 2 1 0 05D0 0028 05D1 2066 0061 005B 0062 2067 05D2 0064 2069 05D4 005D 05D5 2069 0065 0029 0066;1;1;1 1 1 1 2 2 2 2 3 4 2 3 2 3 1 2 1 2;17 16 15 14 4 5 6 7 9 8 10 11 12 13 3 2 1 0 05D0 0028 05D1 2066 0061 005B 0062 2067 05D2 007B 05D3 202A 007D 0020 007B 202C 0063 007D 0064 2069 05D4 005D 05D5 2069 0065 0029 0066;0;0;1 0 1 0 2 2 2 2 3 3 3 x 4 4 4 x 4 4 4 2 3 2 3 0 0 0 0;0 1 2 3 4 5 6 7 12 13 14 16 17 18 10 9 8 19 20 21 22 23 24 25 26 05D0 0028 05D1 2066 0061 005B 0062 2067 05D2 007B 05D3 202A 007D 0020 007B 202C 0063 007D 0064 2069 05D4 005D 05D5 2069 0065 0029 0066;1;1;1 1 1 1 2 2 2 2 3 3 3 x 4 4 4 x 4 4 4 2 3 2 3 1 2 1 2;26 25 24 23 4 5 6 7 12 13 14 16 17 18 10 9 8 19 20 21 22 3 2 1 0 ```
Manishearth commented 1 year ago

Unfortunately #92 causes a massive pile of failures in the basic tests.

Manishearth commented 1 year ago

Down to two failures in #92! And fixed the basic test failures it was causing. There are still ~100 failing basic tests though.

Manishearth commented 1 year ago

Ah, the problem is that isolating run sequences can have gaps in them. I'm going to need to rearchitect some of the N0 work ....

Manishearth commented 1 year ago

The last two failures are https://github.com/unicode-org/properties/issues/70, and it's a whopper.

Manishearth commented 1 year ago

Ah, it's actually not that much of a whopper since it only affects things that I've done on this repo recently :grin:, the existing code actually handled this pretty well.

The basic issue is that the weak and neutral rules must apply within an isolating run sequence only, even if it has gaps. This is mostly fine for all of our iterations, except for a couple cases of lookahead that I did incorrectly, and every case of lookbehind. This is fixable.

Manishearth commented 1 year ago

.... and the character tests pass! still got a ways to go on the basic tests