unicode-org / icu4x

Solving i18n for client-side and resource-constrained environments.
https://icu4x.unicode.org
Other
1.38k stars 176 forks source link

Organize properties code to be more like the other crates (get rid of free functions) #3573

Closed Manishearth closed 1 month ago

Manishearth commented 1 year ago

This is probably a 2.0 change (https://github.com/unicode-org/icu4x/issues/2856)

The properties code uses a lot of free functions for loading property maps and such, rather than using constructors on CodePointSetData and CodePointMapData as would be tradition in the other crates.

This makes things a bit awkward. We should consider this design again for 2.0.

Some open questions:

To-do items:

cc @robertbastian @sffc

Low-priority discussion since it's likely a 2.0 issue.

sffc commented 1 year ago

Discuss with:

Optional:

Manishearth commented 8 months ago

Proposal:

CodePointSetData/etc get const fn propname() and fn propname_unstable() with an option to add buffer/any as needed in the future.

All of these are added on CodePointSetData even if they return CPSDataBorrowed

We can alternatively or additionally add: const PROPNAME: &CPSDataBorrowed

Manishearth commented 8 months ago

March 11, 2024

(manish and robert talk about FFI)

Proposals:

  1. free functions, better modules

icu_properties:

Con: free functions still not super idiomatic for this kind of thing, but not as bad Con: we still need some type to anchor over at FFI Con: you need to go dig in a different module to find the constructors for CPSData Pros:

  1. zero sized types, no modules

icu_properties:

Con: we kinda have a useless zero sized type here, also not idiomatic Rust when you just want to dangle methods Con: you need to go dig in a different type to find the constructors for CPSData Pro: Same on FFI and Rust

  1. zst marker types per property, traits per kind of property
    • struct EmojiProperty; (implements BinaryProperty?)
    • CPSData is still not generic, but the constructor is:
    • CodePointSetData::new::<EmojiProperty>()
    • FFI: you do CodePointSetData::new_emoji() which is kinda what we do now :/
    • open question: do we use marker types or just the value type for enum properties

Pro: no need to have five billion methods and we can do any/buffer/whatever ctors without exploding the API size Con: idk but manish was going down this route originally with his properties refactor and then was asked not to and doesn't recall why lol Con: FFI will still have to do 1 or 2 (2 because we don't have free functions)

Decision: per-property ZST approach (3)

Agreed: @manishearth, @robertbastian, @echeran

@sffc was present for part of the discussion and hasn't seen the final agreement, may further discuss

Manishearth commented 8 months ago

@sffc was a bit lukewarm on the marker types originally because he didn't want to have divergence between FFI and Rust, but we already have significant divergence here because of the free functions

He's happy with the per-property ZST approach, however. We can be more careful in FFI docs to help people comparing FFI and Rust.

@robertbastian noted that the BinaryProperty trait may need to be sealed with some hidden methods.

sffc commented 8 months ago

I like the marker type approach; but can we be more explicit about how we specify the marker types? For example, we already have pub struct CanonicalCombiningClass. Do we add these for all other properties? Maybe make them wrap booleans? Or should we make these be true marker types, pub enum EmojiProperty {}?

Manishearth commented 8 months ago

@sffc that was the "open question: do we use marker types or just the value type for enum properties"

Personally I think we should use the value type for the enum properties, the only potential downside is if we want to reuse them; but they're "open enum" newtypes anyway so i don't think there's a huge case for reuse.

naming can get tricky, are we going to have the binary properties called Ascii? Or are we going to have everyone be FooProperty?

Manishearth commented 4 months ago
CodePointSetData::<Alnum>::load()

which would allow a client to ask for a type CodePointSetData<Alnum> specifically.

Proposal:

LGTM: @sffc @Manishearth @robertbastian @echeran (@hsivonen)

Manishearth commented 4 months ago

@robertbastian may have code for this

otherwise i can work on it

Manishearth commented 1 month ago

I think this was done by #5548?