open-i18n / rust-unic

UNIC: Unicode and Internationalization Crates for Rust
https://crates.io/crates/unic
Other
234 stars 24 forks source link

Partial update to unicode 11 #226

Closed Alexendoo closed 2 years ago

Alexendoo commented 6 years ago

I was attempting to update the repo to use unicode 11 data but ran into a few issues, for the time being I'm setting it aside since it was a bigger job than I was expecting, but I opened this in case it's helpful to save some time for anybody attempting to do the same.

Outstanding issues:

unic-segment - https://unicode.org/reports/tr29/#Modifications

This was the part that tripped me up the most, you can see my attempt to tackle it in https://github.com/Alexendoo/rust-unic/compare/unicode-11-partial...unicode-11. I think the forward word boundaries might be correct, but the others are not.

unic-ident - https://www.unicode.org/reports/tr31/#Modifications

There was a few changes to this spec, I didn't get round to seeing if any code changes are required but the tests do pass, so I don't know if anything is needed to be done

IDNA conformance

With the IDNA conformance test updated to use IdnaTestV2.txt I was able to add a test for unic_idna::to_unicode, however it doesn't return an Err when a status of X4_2 is expected - in this PR the test will fail but it can be easily added to be ignored as V2 and C... are if that's intentional behaviour

What this PR does accomplish:


This change is Reviewable

behnam commented 6 years ago

Thanks, @Alexendoo, for submitting this!

We had build issues on the CI. Could you please rebase this so we can see if there are any problems?

Also, they were some non-straight-forward changes in UAX #29 in Unicode 11.0.0 release. (https://unicode.org/reports/tr29/#Modifications) How are we incorporating that to the implementation here?

Alexendoo commented 6 years ago

@behnam this PR doesn't handle it, I spent some time on it but I didn't really get anywhere

The tests are expected to fail with expected X4_2 in unic_idna::to_unicode since I wasn't sure if that was intentional or not (it came from added test coverage, not a change in implementation)

I don't know if you'd really want to merge this directly since it's not complete. It's more for visibility if somebody else wants to build the tr29 changes on top of it it will save them a bit of time