open-i18n / rust-unic

UNIC: Unicode and Internationalization Crates for Rust
https://crates.io/crates/unic
Other
234 stars 24 forks source link

Initial implementation of UCD Segmentation properties #166

Closed behnam closed 7 years ago

behnam commented 7 years ago

Add UCD Segmentation source data to /data/, implement conversion from new files to property map data tables.

Add unic-ucd-segment component with initial implementation of three main segmentation-related properties:

Current implementation uses char_property!() macro for EnumeratedCharProperty implementation, which only supports TotalCharProperty.

Since the Other (abbr: XX) value in all these properties are notions of non-existance of breaking property, we want to switch to PartialCharProperty domain type and use Option<enum>. This is left as a separate step because it needs changes to the macro.

Tracker: https://github.com/behnam/rust-unic/issues/135

behnam commented 7 years ago

Uh... looks like rust 1.17 doesn't like code-blocks in doc-block metas. Need to find a way to get around this now... :|

CAD97 commented 7 years ago

This looks like the issue with $(#[:meta])* $:ident macro matching again.

Basically, 1.17 can't tell the difference between a #[:meta] and a $:ident, so gives up on macro parsing. The solution is to add a explicit separator between the repeating capture and the ident. It's really annoying that you cannot have a repeating capture followed by an ident in macros before 1.20.

behnam commented 7 years ago

Right, @CAD97! Looks like multiple lines is the problem. I'm working on a fix now. Thanks for the reminder!

behnam commented 7 years ago

Okay, updated this WIP with the changes suggested in https://github.com/behnam/rust-unic/pull/170. I was so focused on some other parts of the code that I didn't realize the Total implementation is local and can be replaced with Partial.

Now going to fix the macro syntax issue. (Which apparently is only fixed in 1.20.0, so we need a syntax that we tolerate for some time...)

CAD97 commented 7 years ago

Suggestion: case Extend ($(#[$variant_meta:meta])+ case $variant:ident). It's kind of silly to introduce case here since Rust doesn't use a case keyword for its enums, but at least it makes sense and would fix the repeating capture followed by ident? Otherwise, I'd just suggest a ; and dealing with ;Extend. Other likely single-char options: E#@$|

Proof of concept https://play.rust-lang.org/?gist=af53821eae5868c2ca8284e3a6fbeeb8

CAD97 commented 7 years ago

Don't forget to update the example, the # Limitations block, and the // TODO to drop the symbol in the macro documentation. Other than that, I don't hate your choice here. (Oh, and make the meta repeat a * not a + since we can do that now. It was only + so it would work with a singular for the time being.)

behnam commented 7 years ago

I think I came up with a more rust-y syntax based on the recent developments on the language side:

$(#[$variant_meta:meta])+
| $variant:ident {
    // ...
}

That's an extra VERTICAL LINE char before the variant name ident.

Now, one open question for this last part is: do we want to only support one syntax for this and update all the call-sites, or should we make it an optional syntax and only use it in new places needed? I like the first (and have done that in the last update) because stability is not a big deal for the macro (being considered an unstable feature at the moment) and keeping the implementation slim is a goal for these parts.

What do you think, @CAD97?

behnam commented 7 years ago

Here's the RFC, btw: https://github.com/rust-lang/rfcs/issues/1745

behnam commented 7 years ago

Okay, 287d950 should address all the issues here. Any other last comments before we land?

behnam commented 7 years ago

bors: r+

bors[bot] commented 7 years ago

Build succeeded