servo / unicode-bidi

Implementation of the Unicode Bidirection Algorithm in Rust
Other
77 stars 33 forks source link

Improve specification conformance of unicode bidi library-Initial step #19

Closed moharnab123saikia closed 8 years ago

moharnab123saikia commented 8 years ago

Files changed: src/lib.rs and tools/generate.py

•added code to tools/generate.py to download the files: http://www.unicode.org/Public/UNIDATA/BidiTest.txt and http://www.unicode.org/Public/UNIDATA/BidiCharacterTest.txt •wrote manual rust tests in lib.rs that can be run automatically

Review on Reviewable

eefriedman commented 8 years ago

Please squash the commits.

Please remove the extra lib.rs~.

Stray tab in generate.py.

for class in rem_classes etc. in test_removed_by_x9.

In Rust code, space before function opening brace, and no blank lines after the opening brace or before the closing brace.

metajack commented 8 years ago

@eefriedman can you re-review? Or should @mbrubeck review?

eefriedman commented 8 years ago

@metajack Re-review what? My initial review comments still weren't addressed.

@moharnab123saikia We can't merge a PR with failing tests; change the test to reflect the current state of the code, and add a comment like // TODO(#999): This should equal "foo" (replace 999 with the issue number for the issue you filed).

metajack commented 8 years ago

@eefriedman Sorry. I assumed he had addressed them since there were new commits.

samruds1 commented 8 years ago

@eefriedman - we have made the changes. Can you please review the changes?

eefriedman commented 8 years ago

Review status: 0 of 5 files reviewed at latest revision, 8 unresolved discussions.


src/lib.rs, line 980 [r8] (raw file): Needs some sort of note about why the test is commented out.


src/lib.rs, line 1003 [r8] (raw file): Missing newline.


src/lib.rs, line 1009 [r8] (raw file): for x in rem_classes.


src/lib.rs, line 1012 [r8] (raw file): for x in not_classes


src/lib.rs, line 1019 [r8] (raw file): Space after commas.


src/lib.rs, line 1020 [r8] (raw file): for x in non_x9_classes.


src/lib.rs, line 1021 [r8] (raw file): Space after comma.


tools/generate.py, line 199 [r8] (raw file): Indentation, and please limit comments to 100 columns where possible.


Comments from the review on Reviewable.io

samruds1 commented 8 years ago

@eefriedman Please review.

eefriedman commented 8 years ago

Review status: 0 of 5 files reviewed at latest revision, 5 unresolved discussions.


src/lib.rs, line 980 [r9] (raw file): Indentation. (Tab?)


src/lib.rs, line 1013 [r9] (raw file): Space before opening brace (same for other loops).


tools/generate.py, line 196 [r9] (raw file): Are you sure reindenting the if statement is correct? It changes the semantics...


tools/generate.py, line 198 [r9] (raw file): Trailing whitespace


Comments from the review on Reviewable.io

eefriedman commented 8 years ago

@mbrubeck Do we want the testcase files in this repository?

vicky-katara commented 8 years ago

@eefriedman, we have made the requested changes. Please let us know if you need any other changes. The only change that we have left out is the commented test case. This test case was failing and should get handled in the next steps. We have left it commented there for reference, and will remove the steps when the further steps have been completed.

samruds1 commented 8 years ago

@eefriedman, @mbrubeck - Do we have to incorporate anymore changes? Can you please review.

eefriedman commented 8 years ago

I'm waiting for an answer to https://github.com/servo/unicode-bidi/pull/19#issuecomment-154898405 ; looks fine otherwise.

mbrubeck commented 8 years ago

Thanks! The new tests look good.

Since the .txt files aren't used yet, let's leave them out of the repo (but keep the code to fetch them). Once we have some code to process the files, we can decide whether it makes sense to check them in.


Reviewed 3 of 4 files at r1, 1 of 1 files at r10, 1 of 1 files at r11. Review status: all files reviewed at latest revision, 6 unresolved discussions.


src/BidiCharacterTest.txt, line 0 [r11] (raw file): I think we should leave these files out of the repo at least for now.


Comments from the review on Reviewable.io

samruds1 commented 8 years ago

@mbrubeck - I have commented out the code and deleted those two files from the repository. Please review the changes.

eefriedman commented 8 years ago

Please fix the commit message (it currently contains a bunch of useless text from squashing).

moharnab123saikia commented 8 years ago

@ https://github.com/eefriedmaneefriedman https://github.com/eefriedman - we have deleted the garbage messages and only kept the details about the changes we made. Please review.

Thanks and Regards, Moharnab Saikia

Please fix the commit message (it currently contains a bunch of useless text from squashing).

— Reply to this email directly or view it on GitHub https://github.com/servo/unicode-bidi/pull/19#issuecomment-155553497.

eefriedman commented 8 years ago

It looks like the new version isn't rebased correctly (GitHub shows two commits).

vicky-katara commented 8 years ago

@eefriedman, we only see one commit from our side which is 'unicode-bidi/ 2a99bca'. Do you want us to squash our commit into your commit that is 'unicode-bidi/ 69e42af'

eefriedman commented 8 years ago

You branch contains commit 69e42af, which doesn't exist on master (the equivalent commit on master is a202520).

samruds1 commented 8 years ago

@eefriedman - We are planning to reset and rebase our commits again. Maybe that will solve the issue.

samruds1 commented 8 years ago

@eefriedman - When I clone the unicode-bidi repository (with none of our changes added) and run the tests defined in lib.rs, a folder called target is getting created. After I add our changes, should I let the folder remain in the repository and push the changes?

mbrubeck commented 8 years ago

target is where Cargo puts compiled code and other files it generates. Git is configured to ignore this directory (using the .gitignore file), so you can leave it in place as you commit and push changes.

samruds1 commented 8 years ago

@mbrubeck - Thanks for the clarification. Can you please review the changes.

vicky-katara commented 8 years ago

@mbrubeck @eefriedman , we wanted some clarification.

eefriedman commented 8 years ago

Currently, generate.py and Cargo aren't linked; generate.py modifies the source code. We could change this so generate.py generates a build artifact instead, I guess... but that isn't obviously worthwhile.

Note that you wouldn't want the build process to fetch data from unicode.org: that would make the resulting binary unpredictable.

What should be the location of 'UnicodeData.txt' and 'ReadMe.txt'?

Currently, nowhere; the fact that the current generate.py saves the files into the current working directory is a bug.

If we wanted to add the files to the repository, we could add a unicode/ directory, I guess, and add a separate script to update those files. That said, I'm not sure that's necessary.

mbrubeck commented 8 years ago

How are generate.py and Cargo linked?

Currently, generate.py is run manually, and its output (src/tables.rs) is checked in to the git repository. This way, we only need to run generate.py when upgrading to a new version of Unicode, which happens relatively infrequently.

Should 'cargo test' call generate.py somehow?

I suggest we do the same thing we do for the non-test code: Run generate.py manually, and check in all the files needed for running tests (either the .txt files, or generated Rust files that contain the same data). This has the disadvantage of making the repository larger, but it should make our test process more reproducible (since it doesn't depend on network resources outside our control).

What should be the location of 'UnicodeData.txt' and 'ReadMe.txt'?

src/UnicodeData.txt and src/ReadMe.txt are temporary files used only by generate.py and should not be checked in to the repository. They are listed in the .gitignore file.

vicky-katara commented 8 years ago

@eefriedman and @mbrubeck, thanks for the detailed replies.

As far as we see, all are changes are in place. Do you see any more changes? Or are we good to go on the merge?

mbrubeck commented 8 years ago

This looks good to me. Thanks!


Reviewed 3 of 4 files at r12. Review status: all files reviewed at latest revision, 6 unresolved discussions.


Comments from the review on Reviewable.io

mbrubeck commented 8 years ago

Oops, there's still a duplicate commit in there (accidentally created in a rebase?) but it doesn't seem to cause any issues so I'll just let this go through.

mbrubeck commented 8 years ago

@bors-servo r+

bors-servo commented 8 years ago

:pushpin: Commit 2a99bca has been approved by mbrubeck

bors-servo commented 8 years ago

:hourglass: Testing commit 2a99bca with merge 06dedd1...

bors-servo commented 8 years ago

:sunny: Test successful - travis

vicky-katara commented 8 years ago

Hi @mbrubeck , @eefriedman

We have the following queries about the subsequent steps of our project as mentioned here: https://github.com/servo/servo/wiki/Improve-specification-conformance-of-unicode-bidi-library

eefriedman commented 8 years ago

There's some basic documentation for the meaning of the bidi tests at the top of BidiTest.txt. Each test line consists of a list of bidi character classes. For each test line (lines without a starting # or @), two things should get checked: the level of each character (essentially the levels array returned from process_text()) and the visual order of the resulting text (essentially the result of reorder_line()). The relevant level and ordering are specified by the @levels and @reorder lines. (The very first test in the file is kind of confusing to look at because it's a test which doesn't actually contain any text.)


I'm pretty sure https://github.com/servo/unicode-bidi/blob/master/src/lib.rs#L687 is referring to the fact that the function only examines each character in the string once; it has nothing to do with the callers.

samruds1 commented 8 years ago

@eefriedman Thanks for the explanation. I am unable to understand what the number 7 denotes in the test cases given below: L LRE R R; 7 L; 7 L LRE; 7 Can you please help me out.

eefriedman commented 8 years ago
# A data line has the following format:
# <input> ; <bitset>
#   <input>  =      An ordered list of BIDI property values
#   <bitset> =      A hex bitset for paragraph levels (P): 1 = auto-LTR, 2 = LTR, 4 = RTL
#                   Auto-LTR (standard BIDI) uses the first L/R/AL character, and is LTR if none is found.

It's a bitset, so 7 means 1|2|4, i.e. the test applies in contexts with an automatic paragraph level, a forced LTR paragraph level, and a forced RTL paragraph level; see http://unicode.org/reports/tr9/#The_Paragraph_Level .