unicode-org / icu4x

Solving i18n for client-side and resource-constrained environments.
https://icu4x.unicode.org
Other
1.29k stars 166 forks source link

Custom variants for icu_preferences #5018

Open zbraniecki opened 1 month ago

zbraniecki commented 1 month ago

Spinning out of #4996

In the initial PR I'm introducing closed enums for keys such as Collation. The motivator is that we only need key variants for values we have divergent code or data for, and we control influx of new data and code, so we can start with a key that has 2 variants, and then when a new CLDR drops, or we extend our code to handle a new variant, we just add it to that enum.

@sffc pointed out that there are multiple reasons we may want to support key variants that we do not know about, including custom overrides, and "new data old code" scenarios. This particularly brings up that we should not exclude ability to select data for a key that our code doesn't know about.

I'll use this issue to discuss solution to that.

zbraniecki commented 1 month ago

I have a solution that I can re-add on top of the macros I'm introducing that makes each enum have Custom(unicode::Value) variant that catches all unknown values, making the enum look like:

enum Collation {
    Standard,
    Search,
    Phonetic,
    Pinyin,
    Searchjl,
   Custom(unicode::Value)
}

and then -u-co-search ends up as Collation::Search but -u-co-foo ends up as Collation::Custom(value!("foo")).

This is easy to add, but it brings an architectural question - is there a value in having enum at all then?

For keys that have "open ended" values such as Currency, I use struct Currency(TinyAsciiStr<3>) to represent an open ended validated range of currencies. If we add Custom, then we could turn Collation, Calendar etc into such struct Collation(Subtag) and struct Calendar(Value) and get rid of enum with known variants at all.

I'm torn on whether this is a better design. I see two use cases of icu_preferences keys API:

In the former the ergonomics of having "known" variants is great, while in the latter it doesn't matter - we retrieve a key from some collection by key, so we either need to serialize the known variant, or we can use Subtag/TinyStr/Value as key.