Closed Manishearth closed 1 month ago
Discuss with:
Optional:
Proposal:
CodePointSetData/etc get const fn propname()
and fn propname_unstable()
with an option to add buffer/any as needed in the future.
All of these are added on CodePointSetData even if they return CPSDataBorrowed
We can alternatively or additionally add: const PROPNAME: &CPSDataBorrowed
March 11, 2024
(manish and robert talk about FFI)
CodePointSetData
facilitates discoverability.CodePointSetData
to BinaryProperty
, which will be one of a handful of items in the crate, making it much more discoverableCodePointSetData
because it emphasizes that the backend can change over time.BinaryProperty
doesn't say anything about the implementation.CodePointMapData
emphasizes that this is a map lookup. The interface could be different.BinaryProperty
with static functions that return CodePointSetData
Proposals:
icu_properties:
Con: free functions still not super idiomatic for this kind of thing, but not as bad Con: we still need some type to anchor over at FFI Con: you need to go dig in a different module to find the constructors for CPSData Pros:
icu_properties:
Con: we kinda have a useless zero sized type here, also not idiomatic Rust when you just want to dangle methods Con: you need to go dig in a different type to find the constructors for CPSData Pro: Same on FFI and Rust
struct EmojiProperty;
(implements BinaryProperty?)CodePointSetData::new::<EmojiProperty>()
CodePointSetData::new_emoji()
which is kinda what we do now :/ Pro: no need to have five billion methods and we can do any/buffer/whatever ctors without exploding the API size Con: idk but manish was going down this route originally with his properties refactor and then was asked not to and doesn't recall why lol Con: FFI will still have to do 1 or 2 (2 because we don't have free functions)
Decision: per-property ZST approach (3)
Agreed: @manishearth, @robertbastian, @echeran
@sffc was present for part of the discussion and hasn't seen the final agreement, may further discuss
@sffc was a bit lukewarm on the marker types originally because he didn't want to have divergence between FFI and Rust, but we already have significant divergence here because of the free functions
He's happy with the per-property ZST approach, however. We can be more careful in FFI docs to help people comparing FFI and Rust.
@robertbastian noted that the BinaryProperty trait may need to be sealed with some hidden methods.
I like the marker type approach; but can we be more explicit about how we specify the marker types? For example, we already have pub struct CanonicalCombiningClass
. Do we add these for all other properties? Maybe make them wrap booleans? Or should we make these be true marker types, pub enum EmojiProperty {}
?
@sffc that was the "open question: do we use marker types or just the value type for enum properties"
Personally I think we should use the value type for the enum properties, the only potential downside is if we want to reuse them; but they're "open enum" newtypes anyway so i don't think there's a huge case for reuse.
naming can get tricky, are we going to have the binary properties called Ascii
? Or are we going to have everyone be FooProperty
?
CodePointSetData::<Alnum>::load()
which would allow a client to ask for a type CodePointSetData<Alnum>
specifically.
icu_properties_types
CodePointSetData::new::<Alnum>()
;new_unstable::<Foo>(provider)
(etc) functions and CodePointSetDataBorrowed has new::<Foo>()
functions. This facilitates feature-gating.icu_properties
to smaller crates or at least multiple features.Proposal:
CPSData
,CPSDataBorrowed
,CPMData
,CPMDataBorrowed
all live at the top levelBinaryProperty
and EnumeratedProperty
sealed traits (could be hoisted to lib.rs?)maps
or sets
module at the toplevel (each only has 2 types!!!)script
can stay, but gets rid of free functionsbidi
/bidi_data
can stay, but gets rid of free functionsCodePointSetData::new_unstable::<Alnum>()
, CodePointSetDataBorrowed::new::<Alnum>()
, etc. CodePointSetData
stays non-genericCodePointMapData::<GeneralCategory>::new_unstable()
, CodePointMapDataBorrowed::<GeneralCategory>::new()
, with methods for erasing.try_new_unstable
etc. Docs properly advertise this.LGTM: @sffc @Manishearth @robertbastian @echeran (@hsivonen)
@robertbastian may have code for this
otherwise i can work on it
I think this was done by #5548?
This is probably a 2.0 change (https://github.com/unicode-org/icu4x/issues/2856)
The properties code uses a lot of free functions for loading property maps and such, rather than using constructors on
CodePointSetData
andCodePointMapData
as would be tradition in the other crates.This makes things a bit awkward. We should consider this design again for 2.0.
Some open questions:
CodePointMapData<GeneralCategory>
(etc) whereas FFI uses an erased version?To-do items:
load_
instead ofget_
cc @robertbastian @sffc
Low-priority discussion since it's likely a 2.0 issue.