unicode-org / icu4x

Solving i18n for client-side and resource-constrained environments.
https://icu4x.unicode.org
Other
1.38k stars 178 forks source link

Constructor Options Bag normalization #5839

Open zbraniecki opened 3 days ago

zbraniecki commented 3 days ago

With introduction of Preferences we now have the final shape of the first argument to constructors and implemented the normalized model for how to converge Locale and Preferences.

The remaining question is how to handle the second argument which we currently have three approaches to:

  1. Many constructors take a custom OptionsBag, similar to PreferencesBag but for developer-induced options
  2. Some constructors take no second argument
  3. Finally several take an explicit argument

It looks like this:


// OptionsBag
struct FooOptions {
    type: FooType
}

impl Foo {
    pub fn try_new(prefs: FooPreferences, rule_type: FooOptions) -> Result<Self, Error>;
}

// No second argument
impl Foo {
    pub fn try_new(prefs: PluralRulesPreferences) -> Result<Self, Error>;
}

// Explicit argument
impl Foo {
    pub fn try_new(prefs: FooPreferences, type: FooType) -> Result<Self, Error>;
}

We have an opportunity to normalize it and conform all constructors to Option 1, if we want to, but it doesn't feel like a natural obvious choice, so I'd like to discuss it.

How we modeled preferences, locales and options

The current architecture separates Preferences which represent user-provided values from Options which represent developer-provided values. This is novel and different from how ECMA-402 handles the constructors. In ECMA-402 we have (locale, options) pair where Options is a bag combining keys set by user and developer and the user-ones get merged with locale extension keys.

In our model the Preferences and Locale get merged as they represent the user-driven keys, and the Options are split into two "types".

First type are options that are affecting data slicing. In that case we push them to the constructor itself to allow for DCE to remove unnecessary code. Second type is options that don't affect DCE. It looks like this:

enum FooType {
    Bar,
    Baz,
}

// Option as argument
impl Foo {
    pub fn try_new(prefs: FooPreferences, type: FooType) -> Result<Self, Error>;
}

// Option as part of the constructor
impl Foo {
    pub fn try_new_bar(prefs: FooPreferences) -> Result<Self, Error>;
}
impl Foo {
    pub fn try_new_baz(prefs: FooPreferences) -> Result<Self, Error>;
}

Value proposition of normalization

If we were to normalize the second argument I see three main value props:

  1. It simplifies the DX by providing consistent API shape for all constructors
  2. It allows for introduction of patterns to normalize options merging, converging, converting and cascading (similar to Preferences)
  3. It allows for an easy introduction of Option keys that are universal across all constructors - ECMA-402 has one like that called localeMatcher

Why we shouldn't do this

There are several reasons for which we may not want to normalize it:

  1. Not all constructors have options. Introducing an empty Bag just for the sake of normalization and asking developers to pass Default::default(), None or options!{} to each constructor just to have consistency may not be worth it.
  2. As for ECMA-402 compatibility we'll have to introduce ECMA-402 OptionsBag anyway and write logic to convert from ECMA-402 behavior to ICU4X, so the value of normalized OptionsBag in ICU4X is reduced.
  3. The only known universal-option across all components in ECMA-402 - LocaleMatcher has questionable value, is basically never used and while we may have to find a way to support it, it is not clear if we should design ICU4X component API around it just because LocaleMatcher exists.

Counters:

  1. While not all constructors have options, it doesn't cost a lot to have two-argument constructors. We can even argue that the argument should be Option<OptionsBag> to allow for None if we think it's better DX than Default::default().
  2. Building conversion logic from ECMA-402 bag to ICU4X Preferences+Options can be abstracted to macros easier than if each constructor has it's own shape.
  3. If we decide that we have to support LocaleMatcher (to pass test-262) then we have to find a way to pass this argument to each constructor somehow... Even if no other Options are needed by that constructor.

Is it urgent?

I'm not sure if it is. It would be nice to have a consistent, ECMA-402 compatible, constructor story for 2.0 and the PR is going to be quite simple to write if we decide to go for it, but it may be that we have different opinions and should give ourselves time to think it through and plan this for 3.0.

sffc commented 3 days ago

About LocaleMatcher specifically: this only affects data loading, right? I don't think most individual component constructors will have anything special to do with LocaleMatcher except for passing it down to the data provider. Things that are passed down to the data provider can be done externally via a custom data provider. This is just to say that I don't think LocaleMatcher by itself is a very strong argument in favor of requiring options.

What we've done in the past, and what I lean toward doing here, is to keep try_new functions the way they are, and if we need a version with options in the future, we add try_new_with_options, and in the next major release, we can switch it to be the default.

zbraniecki commented 2 days ago

This is just to say that I don't think LocaleMatcher by itself is a very strong argument in favor of requiring options.

That's a great point. Basically we can handle ECMA-402 locale matcher even by something like:

let provider;

provider.setLocaleMatcherStrategy(LocaleMatcherStrategy::Foo);

let dtf = FooFormatter::try_new(provider, preferences);

this way we don't need it in Options.

What we've done in the past, and what I lean toward doing here, is to keep try_new functions the way they are, and if we need a version with options in the future, we add try_new_with_options, and in the next major release, we can switch it to be the default.

Yes, I'd like us to decide if we are ok having effectively three patterns for runtime arguments: 1) no arguments, 2) options bag, 3) single option. One half-way convergence would be to migrate (3) to (2) and have only two scenarios.

sffc commented 2 days ago

I think the following choices are fine:

  1. try_new with options
  2. try_new with no options
    • To add options later: create try_new_with_options and reconcile at next major release
  3. try_new_with_foo with a positional argument named foo
    • To add options later: create a function named try_new

Things we should not approve:

  1. try_new with a positional argument or any arguments other than provider, preferences and options
zbraniecki commented 21 hours ago

Thanks! Here's a list of cases that do not follow your heuristics:

./components/plurals/src/lib.rs
322:    pub fn try_new_unstable(
368:    pub fn try_new_cardinal_unstable(
425:    pub fn try_new_ordinal_unstable(
618:    pub fn try_new_unstable(
660:    pub fn try_new_cardinal_unstable(
699:    pub fn try_new_ordinal_unstable(
747:    pub fn try_new_with_rules_unstable(

./components/datetime/src/neo.rs
186:    pub fn try_new(
211:    pub fn try_new_unstable<P>(
236:    fn try_new_internal<P, L>(
411:    pub fn try_new(
436:    pub fn try_new_unstable<P>(
459:    fn try_new_internal<P, L>(

./components/segmenter/src/sentence.rs
149:    pub fn try_new_unstable<D>(provider: &D) -> Result<Self, DataError>
173:    pub fn try_new_with_options_unstable<D>(

./components/segmenter/src/word.rs
238:    pub fn try_new_auto_unstable<D>(provider: &D) -> Result<Self, DataError>
268:    pub fn try_new_auto_with_options_unstable<D>(
362:    pub fn try_new_lstm_unstable<D>(provider: &D) -> Result<Self, DataError>
391:    pub fn try_new_lstm_with_options_unstable<D>(
476:    pub fn try_new_dictionary_unstable<D>(provider: &D) -> Result<Self, DataError>
504:    pub fn try_new_dictionary_with_options_unstable<D>(

./components/segmenter/src/line.rs
415:    pub fn try_new_auto_unstable<D>(provider: &D) -> Result<Self, DataError>
456:    pub fn try_new_lstm_unstable<D>(provider: &D) -> Result<Self, DataError>
494:    pub fn try_new_dictionary_unstable<D>(provider: &D) -> Result<Self, DataError>
534:    pub fn try_new_auto_with_options_unstable<D>(
584:    pub fn try_new_lstm_with_options_unstable<D>(
641:    pub fn try_new_dictionary_with_options_unstable<D>(

./components/calendar/src/any_calendar.rs
619:    pub fn try_new_with_any_provider<P>(
666:    pub fn try_new_with_buffer_provider<P>(
712:    pub fn try_new_unstable<P>(provider: &P, kind: AnyCalendarKind) -> Result<Self, DataError>
782:    pub fn try_new_for_locale_unstable<P>(

./components/calendar/src/week_of.rs
40:    pub fn try_new_unstable<P>(provider: &P, locale: &DataLocale) -> Result<Self, DataError>

./components/experimental/src/transliterate/transliterator/mod.rs
242:    pub fn try_new(locale: &Locale) -> Result<Self, DataError> {
251:    pub fn try_new_with_any_provider(
263:    pub fn try_new_with_buffer_provider(
275:    pub fn try_new_unstable<PT, PN>(
329:    pub fn try_new_with_override<F>(locale: &Locale, lookup: F) -> Result<Self, DataError>
342:    pub fn try_new_with_override_with_any_provider<F>(
359:    pub fn try_new_with_override_with_buffer_provider<F>(
376:    pub fn try_new_with_override_unstable<PT, PN, F>(

./components/experimental/src/dimension/currency/long_formatter.rs
62:    pub fn try_new(
103:    pub fn try_new_unstable<D>(

./components/experimental/src/dimension/units/formatter.rs
95:    pub fn try_new(
133:    pub fn try_new_unstable<D>(

./components/experimental/src/displaynames/displaynames.rs
55:    pub fn try_new_unstable<D: DataProvider<RegionDisplayNamesV1Marker> + ?Sized>(
122:    pub fn try_new_unstable<D: DataProvider<ScriptDisplayNamesV1Marker> + ?Sized>(
190:    pub fn try_new_unstable<D: DataProvider<VariantDisplayNamesV1Marker> + ?Sized>(
253:    pub fn try_new_unstable<D: DataProvider<LanguageDisplayNamesV1Marker> + ?Sized>(
337:    pub fn try_new_unstable<D>(
sffc commented 12 hours ago

Thoughts:

zbraniecki commented 5 hours ago

In #5852 I

Left: