Closed Alexendoo closed 2 years ago
Thanks, @Alexendoo, for submitting this!
We had build issues on the CI. Could you please rebase this so we can see if there are any problems?
Also, they were some non-straight-forward changes in UAX #29 in Unicode 11.0.0 release. (https://unicode.org/reports/tr29/#Modifications) How are we incorporating that to the implementation here?
@behnam this PR doesn't handle it, I spent some time on it but I didn't really get anywhere
The tests are expected to fail with expected X4_2
in unic_idna::to_unicode
since I wasn't sure if that was intentional or not (it came from added test coverage, not a change in implementation)
I don't know if you'd really want to merge this directly since it's not complete. It's more for visibility if somebody else wants to build the tr29 changes on top of it it will save them a bit of time
I was attempting to update the repo to use unicode 11 data but ran into a few issues, for the time being I'm setting it aside since it was a bigger job than I was expecting, but I opened this in case it's helpful to save some time for anybody attempting to do the same.
Outstanding issues:
unic-segment - https://unicode.org/reports/tr29/#Modifications
This was the part that tripped me up the most, you can see my attempt to tackle it in https://github.com/Alexendoo/rust-unic/compare/unicode-11-partial...unicode-11. I think the forward word boundaries might be correct, but the others are not.
unic-ident - https://www.unicode.org/reports/tr31/#Modifications
There was a few changes to this spec, I didn't get round to seeing if any code changes are required but the tests do pass, so I don't know if anything is needed to be done
IDNA conformance
With the IDNA conformance test updated to use IdnaTestV2.txt I was able to add a test for
unic_idna::to_unicode
, however it doesn't return anErr
when a status ofX4_2
is expected - in this PR the test will fail but it can be easily added to be ignored asV2
andC...
are if that's intentional behaviourWhat this PR does accomplish:
#[ignore]
s the test for unicode version of unic-ucd-segmentThis change is