Closed samruds1 closed 7 years ago
Here are some of the questions we have so far:
1) Rust code for step N0 has been added to bracket_pairs.rs, need help with integration into lib.rs. 2) Code for W1-W7 has been added to lib.rs. Please verify if correct. 3) Need clarifications for the Rust test cases inserted in src/lib.rs: a) We are not able to execute cargo test possibly due to large number of test cases. b) For test cases added from BidiTest.txt we have removed characters where level equals 'x'. Is this correct. 4) L1 step has not been implemented yet.
@eefriedman @mbrubeck - Can you please review.
@jdm, during the execution of the test case automation suite, these strange files get created at some step. We were not able to decipher where or why they get created. However, I have deleted them and have added the entry to the ".gitignore" file to make sure that they never get pushed again. I had added this entry to .gitignore in the commit before the one which had the issue, however I made the commit while inside the /src/ directory, hence ignoring the .gitignore file.
I hope everything looks good now.
@mbrubeck , @eefriedman, @jdm while we have pushed out code in this commit, we are yet to integrate the code for our steps L1 and N0 into lib.rs. Here are some of the points that I need help on, to proceed with this integration:
I need to know how to give input to the resolve_bracket_pairs() function in bracket_pairs.rs.
(a) Where will I find the indexes[] array containing character codes for the characters in the input string?
(b) Where will I find the 'sos' and 'level' variables holding the start of sequence and the embedding direction level for the given run?
(c) How do I deliver output codes back for the next step to proceed. For this I need to pass the 'codes' array which holds the output codes for the brackets.
Apart from these, we need information on integration of our code.
(a) Should the code for the new steps be part of lib.rs or can they appear in a different file and be imported into lib.rs?
(b) Should we keep the rust modules for the two steps L1 and N0 in different rust source code files or the in the same one,(in the case we that we are not integrating them into lib.rs)?
For the record, @mbrubeck is unfortunately on vacation today and this weekend :/
(a) Where will I find the indexes[] array containing character codes for the characters in the input string?
str has a few related methods for extracting characters; see https://doc.rust-lang.org/nightly/std/primitive.str.html#method.chars and https://doc.rust-lang.org/nightly/std/primitive.str.html#method.char_indices . For performance reasons, it's usually best to stick with the &str
representation of a string rather than converting it to a &[char]
.
(a) Should the code for the new steps be part of lib.rs or can they appear in a different file and be imported into lib.rs?
Separate the code into different files however you feel is best. Putting the code for a complicated rule with a single entry point into a separate file seems like a good idea.
(b) Should we keep the rust modules for the two steps L1 and N0 in different rust source code files or the in the same one,(in the case we that we are not integrating them into lib.rs)?
Separating them seems fine.
(c) How do I deliver output codes back for the next step to proceed. For this I need to pass the 'codes' array which holds the output codes for the brackets.
At a high level, step N0 is similar to resolve_weak
: you're taking the array of BIDI classes, and resolving some of them, so the entry point probably looks similar.
(b) Where will I find the 'sos' and 'level' variables holding the start of sequence and the embedding direction level for the given run?
See IsolatingRunSequence.
Thanks, @eefriedman. We'll get the changes in as soon as possible!
@eefriedman I have some doubts too. Could you please answer them.
1) For the test cases related to BidiTest.txt:
We have added test cases for the method process_text(). The test cases are of the same format as the existing test cases:
assert_eq!(process_text("abc123", Some(0)), BidiInfo { levels: vec![0, 0, 0, 0, 0, 0], classes: vec![L, L, L, EN, EN, EN], paragraphs: vec![ParagraphInfo { range: 0..6, level: 0 }], });
a) Is the following test case correct?
assert_eq!(process_text("\u{FA5E}\u{1F102}\u{FF0B}\u{20B0}\u{FE55}\u{E017E}\u{001C}\u{001F}\u{2000}\u{1015A}\u{2066}\u{2067}\u{2068}\u{2069}", Some(0)), BidiInfo { levels: vec![ 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3], classes: vec![L, EN, ES, ET, CS, NSM, B, S, WS, ON, LRI, RLI, FSI, PDI], paragraphs: vec![ParagraphInfo { range: 0..14, level: 0 } ], });
b) As per my understanding the value in the Some field in the given test should be the values of the bitset (para levels). Is that correct?
c) While inserting the test cases in lib.rs can we remove Bidi Classes which correspond to levels with the value x.
for example, can we ignore this test case?
@Levels: x
@Reorder:
LRE; 7
LRO; 7
RLE; 7
or in the following example, can we remove LRE, LRO, RLE from these test cases?
@Levels: 0 x @Reorder: 0 L LRE; 3 L LRO; 3 L RLE; 3
2) For the implementation of step L1 can a separate function be created (resolve_white_space) instead of inserting the code in the visual_runs method?
@eefriedman, @mbrubeck
I understand that my function which will perform the Step N0 will have to be called in the resolve_neutral function. I want to know how to get the input array. Does an IsolatingRunSequence contain the input string (in a &txt or &[char] format)? If yes, how do I extract it?
I understand that my function which will perform the Step N0 will have to be called in the resolve_neutral function. I want to know how to get the input array. Does an IsolatingRunSequence contain the input string (in a &txt or &[char] format)? If yes, how do I extract it?
No, IsolatingRunSequence does not contain a string; you'll have to pass it in separately.
@eefriedman, @mbrubeck
I am getting a failed test case, and I am not able to identify what is wrong. Could you have a look?
Assert Left: "])[(\u{2680}a\u{5d1}(\u{5d0}" Assert Right: "\u{5d1}(\u{5d0}a\u{2680}([)]"
It's a little hard to figure out what exactly is going wrong from just the output strings... but are you sure you're implementing BD16 correctly?
Yes, I just debugged BD16. It recognizes only one pair of brackets for the above string.
What's the input string for your testcase?
@eefriedman @mbrubeck @jdm
We had some confusion about how to proceed. At this point, we can control what test cases are being inserted from BidiTest.txt and BidiCharacterTest.txt. If we put in all the test cases, the build just does not succeed. And it is getting difficult to identify why.
Should we insert a few passing test cases which test the steps that we have added, so that the pull request can get merged? We could mention the test cases which do not pass.
@eefriedman here is the test case from BidiCharacterTest.txt:
05D0 0028 05D1 0061 2680 0028 005B 0029 005D;0;0;1 1 1 0 0 0 0 0 0;2 1 0 3 4 5 6 7 8
Are you sure you're converting the testcase correctly? You can end up with either your left or your right result depending on the default paragraph embedding level (an LTR context versus an RTL context).
I am taking the index values given in the test case to form the result.
Here is the description of a test case: -# Field 0: A sequence of hexadecimal code point values separated by space -# Field 1: A value representing the paragraph direction, as follows: -# 0 represents left-to-right -# 1 represents right-to-left -# 2 represents auto-LTR according to rules P2 and P3 of the algorithm -# Field 2: The resolved paragraph embedding level -# Field 3: A list of resolved levels; characters removed in rule X9 are -# indicated with an 'x' -# Field 4: A list of indices showing the resulting visual ordering from -# left to right; characters with a resolved level of 'x' are skipped
I am using Field 0 as input, and taking indexing into Field 0 using Field 4 as output indices. Do you think there is something wrong with this approach? The LTR/RTL context should be taken care of by the output, right?
The "0" in Field 1 represents left-to-right, which means that you should apply HL1 and explicitly set the paragraph embedding level to zero.
Does this mean that I take the output produced by using the indexes of Field 0, and reverse the string?
process_text
has two inputs; the first is the string, the second is the paragraph embedding level.
@eefriedman, I noticed that as well. Thanks for the help. I'll try with this and get back to you!
I am now getting this error:
thread 'test::test_reorder_line' panicked at 'index 2 and/or 3 in אב(גד[&ef].)gh
do not lie on character boundary', ../src/libcore/str/mod.rs:1284
This is the test case: assert_eq!(reorder_with_para_level("\u{05D0}\u{05D1}\u{0028}\u{05D2}\u{05D3}\u{005B}\u{0026}\u{0065}\u{0066}\u{005D}\u{002E}\u{0029}\u{0067}\u{0068}", Some(0)),"\u{05D1}\u{05D0}\u{0028}\u{05D3}\u{05D2}\u{005B}\u{0026}\u{0065}\u{0066}\u{005D}\u{002E}\u{0029}\u{0067}\u{0068}");//BidiCharacterTest.txt Line Number:40
Here is the input:
05D0 05D1 0028 05D2 05D3 005B 0026 0065 0066 005D 002E 0029 0067 0068;0;0;1 1 0 1 1 0 0 0 0 0 0 0 0 0;1 0 2 4 3 5 6 7 8 9 10 11 12 13
Do you know why this fails?
A "do not lie on character boundary" panic generally means you're using string indexing incorrectly; e.g. let x = &"⚀"[1..];
. (See also https://doc.rust-lang.org/nightly/book/strings.html#slicing .) Try getting a backtrace to figure out where it's happening.
I am not able to make sense of this backtrace. Is there any indicator of the point of failure here?
thread 'test::test_reorder_line' panicked at 'index 2 and/or 3 in אב(גד[&ef].)gh
do not lie on character boundary', ../src/libcore/str/mod.rs:1284
stack backtrace:
1: 0x564cc1e817fe - sys::backtrace::write::haf6e4e635ac76143Ivs
2: 0x564cc1e84f25 - panicking::on_panic::ha085a58a08f78856lzx
3: 0x564cc1e7885e - rt::unwind::begin_unwind_inner::hc90ee27246f12475C0w
4: 0x564cc1e791f6 - rt::unwind::begin_unwind_fmt::ha4be06289e0df3dbIZw
5: 0x564cc1e84b86 - rust_begin_unwind
6: 0x564cc1eb3aa4 - panicking::panic_fmt::he7875691f9cbe589SgC
7: 0x564cc1eb624f - str::slice_error_fail::hbe442ed6cb13a72dsqK
8: 0x564cc1e0688a - str::traits::str.ops..Index<ops..Range
failures: test::test_reorder_line
test result: FAILED. 11 passed; 1 failed; 0 ignored; 0 measured
thread '
@eefriedman, I have commented out the code that we had inserted. It seems that this error arises even with existing code. This tells me that the test case itself is going wrong somewhere. Are you sure this is the correct test case?
05D0 05D1 0028 05D2 05D3 005B 0026 0065 0066 005D 002E 0029 0067 0068;0;0;1 1 0 1 1 0 0 0 0 0 0 0 0 0;1 0 2 4 3 5 6 7 8 9 10 11 12 13
assert_eq!(reorder_with_para_level("\u{05D0}\u{05D1}\u{0028}\u{05D2}\u{05D3}\u{005B}\u{0026}\u{0065}\u{0066}\u{005D}\u{002E}\u{0029}\u{0067}\u{0068}", Some(0)),"\u{05D1}\u{05D0}\u{0028}\u{05D3}\u{05D2}\u{005B}\u{0026}\u{0065}\u{0066}\u{005D}\u{002E}\u{0029}\u{0067}\u{0068}");//BidiCharacterTest.txt Line Number:40
I have made the following function to test reorder line while specifying the paragraph level input:
fn reorder_with_para_level(s: &str, level: Option
It looks like it's crashing at https://github.com/servo/unicode-bidi/blob/2a99bcaadd34bda9f8ab035bbdc3d170c5d78d71/src/lib.rs#L169 ... and it looks like that range comes from https://github.com/servo/unicode-bidi/blob/2a99bcaadd34bda9f8ab035bbdc3d170c5d78d71/src/lib.rs#L204 . I'm guessing we're somehow violating the constraint that every byte of a character has the same level?
Anyway, I don't think your test should be triggering this; I'm guessing it's a bug in the code.
Thanks for looking into it.
How should we go about this now?
We've implemented all the steps and added code to pull test cases from BidiTest.txt and BidiCharacterTest.txt
@vicky-katara For now, you can temporarily comment out or delete the failing test(s), or add code to your Python script to do this (maybe using a list of line numbers that are known to fail). File a new issue in GitHub about the failure so that we will come back and fix it later.
@mbrubeck, there are more than a 100,000 test cases in total, from BidiCharacterTest.txt and BidiTest.txt. I don't think finding a list of failing test cases is viable. Will it be fine if we include a list of passing test cases, with comments describing how to add all test cases back to lib.rs (for example by uncommenting code which adds test cases)?
Will it be fine if we include a list of passing test cases, with comments describing how to add all test cases back to lib.rs (for example by uncommenting code which adds test cases)?
Yes, that sounds good.
@eefriedman @mbrubeck
We're getting failed test cases for test cases like this from BidiTest.txt:
@ Levels: x 4 @ Reorder: 1 LRE AN; 7 RLE L; 4 RLE EN; 4 RLE AN; 4
We feel we built the test case correctly. Could you help us convert this test case into a rust assert test case of this form:
This is what we have:
assert_eq!(process_text("\u{0669}\u{11F2}\u{1D7D6}\u{10E74}", Some(1)), BidiInfo { levels: vec![ 7, 4, 4, 4], classes: vec![AN, L, EN, AN], paragraphs: vec![ParagraphInfo { range: 0..4, level: 4 } ], });
@mbrubeck @eefriedman @jdm
Here is what we have accomplished:
Additionally:
*Identified defect with process_text [https://github.com/servo/unicode-bidi/issues/22].
Great work, thanks! I haven't had time to review all of this code yet, but I look forward to getting it merged.
Sorry I haven't reviewed this PR yet; I'm just back from vacation and will be working on it this week. Since there's a lot of code here split across a few dozen commits, I think I'll try to split it into a few separate PRs that can be reviewed landed separately. If I find code that needs to be changed, I'm happy to apply the changes myself (and have another Servo developer review my changes as needed). Or if any of you have the time and inclination to keep working on this code, that would be great too—just let me know.
:umbrella: The latest upstream changes (presumably #25) made this pull request unmergeable. Please resolve the merge conflicts.
I think we can close this PR now that we have another approach for having conformance tests (as integration tests) in https://github.com/servo/unicode-bidi/pull/30 . There's still more work needed (including adding the char-dependent tests rules), but they should be done in a separate PR, IMHO.
Files added: src/BidiCharacterTest.txt src/bracket_pairs.rs src/BidiTest.txt src/UnicodeData.txt
tools/bidiCharacterTestParser.py tools/bidiTest.py tools/bidiCharacterTestParser.pyc
tools/bidiTest.pyc
Preliminary pull request. Need help with integration of new code with existing code. Verification needed for Rust conformance tests added.