unicode-org / icu4x

Solving i18n for client-side and resource-constrained environments.
https://icu4x.unicode.org
Other
1.39k stars 180 forks source link

Constructor signatures in Rust and FFI #2136

Closed sffc closed 2 years ago

sffc commented 2 years ago

In #1833 we're starting to establish a pattern for constructor signatures in Rust. I want to establish how that extends to FFI.

There are some constraints that apply to FFI but not Rust:

  1. No type parameters (must be expanded into multiple functions)*
  2. No non-exhaustive structs, unless they are opaque
  3. Opaque structs require extra allocations so we should not use them with impunity
  4. No macros for generating APIs (not supported in syn; ask @Manishearth for details)*

* Diplomat itself could be extended to reduce duplication and improve code maintainability/readability on an as-needed basis.

I'm going to approach this question assuming that constructors typically take 3 parameters: locale/preferences, data provider, and configuration options.

I. Locale/Preferences

Options:

  1. Transparent (exhaustive) preferences struct
    • Pro: Cheap to construct
    • Pro: Consistent with Rust
    • Con: Cannot easily add additional preferences (exhaustive)
  2. Opaque preferences struct with getters/setters
    • Pro: Consistent with Rust
    • Pro: Extensible (can add more preferences)
    • Con: More expensive (extra allocations)
    • Con: More boilerplate (getters/setters)
  3. Locale object
    • Pro: Easy to implement
    • Pro: Less boilerplate
    • Con: Requires using Locale which may harm code size and/or type safety
    • Con: Doesn't support non-BCP-47 preferences
  4. DataLocale object (#1995)
    • Pro: Easy to implement
    • Pro: Less boilerplate
    • Pro: Code size friendly
    • Con: Less type safety relative to preferences structs
    • Con: Doesn't support non-Unicode-extension preferences

II. Data Provider

Options:

  1. Two constructors, one for AnyProvider and one for BufferProvider
    • Pro: Future-proof
    • Con: Does not support caching data providers (see #919 for details)
  2. One constructor with call-site-specific data provider wrapper like FixedDecimalFormatProvider
    • Pro: Supports caching and alternative data provider implementations
    • Con: Requires adding a new wrapper whenever the data keys are changed
  3. Three constructors (options 1 and 2)
    • Pro: Maximum flexibility
    • Con: More boilerplate

III. Configuration Options

Options:

  1. Transparent (exhaustive) options struct
    • Pro: Cheap to construct
    • Con: Requires creating versioned structs and versioned constructors
  2. Opaque options struct with getters/setters
    • Pro: Extensible
    • Con: More expensive at runtime
    • Con: More boilerplate

Thoughts? @zbraniecki @Manishearth @robertbastian

sffc commented 2 years ago

Discussion:

Things we need to do with constructors:

  1. Fix argument order
  2. Use preferences or other change to the locale argument (pending final decision on the shape of this argument)
  3. Add "overloads" for the 3 types of data providers
  4. Ensure that options are taken by value
sffc commented 2 years ago

Discussion with @sffc @Manishearth:

sffc commented 2 years ago

It appears that #1833 still has unresolved questions.

In light of that, let me be clear on what I am proposing for constructor signatures in ICU4X 1.0.

In Rust:

impl FooFormat {
    pub fn try_new_any(p: &P, locale: &DataLocale, options: FooOptions)
        -> Result<FooFormat, DataError>
    where
        P: AnyProvider

    pub fn try_new_buffer(p: &P, locale: &DataLocale, options: FooOptions)
        -> Result<FooFormat, DataError>
    where
        P: BufferProvider

    pub fn try_new_unstable(p: &P, locale: &DataLocale, options: FooOptions)
        -> Result<FooFormat, DataError>
    where
        P: ResourceProvider<Marker1> + ResourceProvider<Marker2>
}

#[non_exhaustive]
pub struct FooOptions {
    pub option1: i32,
    // ...
}

In FFI:

class ICU4XFooFormat {
  public:
    DiplomatResult<FooFormat, DataError> try_new_any(
      const ICU4XAnyProvider& provider,
      const ICU4XDataLocale& locale,
      const ICU4XFooOptions& options
    );

    DiplomatResult<FooFormat, DataError> try_new_buffer(
      const ICU4XBufferProvider& provider,
      const ICU4XDataLocale& locale,
      const ICU4XFooOptions& options
    );
}

class ICU4XFooOptions; // opaque
sffc commented 2 years ago

I should also note a potential path for Rust constructors involving traits

trait TryNewBufferProvider<O> {
    fn try_new(locale: DataLocale, p: &impl BufferProvider, options: O)
}

impl TryFromBufferProvider<FooOptions> for FooFormat { ... }
sffc commented 2 years ago
sffc commented 2 years ago
sffc commented 2 years ago

Consensus: No objection to what I proposed in https://github.com/unicode-org/icu4x/issues/2136#issuecomment-1183797247.

sffc commented 2 years ago

I reviewed all component and property APIs. To-do list:

Questions:

  1. TimeZoneFormatter: do we want to generate a bunch of Any/Buffer constructors for all the load functions, or just try_from_config?
  2. CldrCalendar trait: should this trait be doc(hidden) or do we want people to be able to implement it?
  3. icu_decimal::provider: should all the types be labled "V1" or only the main one?
  4. icu_list::List: should it be named FormattedList?
  5. DateFormatter, TimeFormatter, ListFormatter: these take a single flat option; should they be wrapped in a non-exhaustive options struct? Or at least be named with the flat option in the method name?
  6. Should icu_provider::serde::borrow_de_utils be doc-hidden?
sffc commented 2 years ago

For FFI purposes, it would make things a lot simpler if we had just one provider type. It would be an enum between a buffer provider and an any provider.

This doesn't solve Serde code size issues directly, but we can circumvent that problem by having a feature on the icu_capi crate to enable or disable the BufferProvider version of it.

Manishearth commented 2 years ago

Hmmm. Not ideal but I guess it works.

sffc commented 2 years ago

None of our choices are ideal:

  1. Enum toggling between Buffer and Any with a feature on icu_capi to enable/disable Buffer mode
  2. Add two versions of all the constructors over FFI
  3. Add constructor-specific data provider opaque wrappers

We have some precedent of using features to reduce code size (deserialize_json). So I think it's fine.

We could even do the same thing on the Rust side to get from 3 constructors down to 2 constructors. But I'm not proposing that we do that in 1.0. We should consider it for 2.0 alongside the new Preferences stuff.

sffc commented 2 years ago

Proposal: Have a single ICU4XDataProvider with features to independently enable the AnyProvider and the BufferProvider versions.

LGTM: @robertbastian @Manishearth @sffc @nordzilla

sffc commented 2 years ago

This is finally fixed.