unicode-org / icu4x

Solving i18n for client-side and resource-constrained environments.
https://icu4x.unicode.org
Other
1.38k stars 178 forks source link

Split icu_locale into two crates #3931

Open sffc opened 1 year ago

sffc commented 1 year ago

There is too much going on in icu_locid_transform for it to be a core crate. It should only contain the things essential for fallback.

sffc commented 1 year ago

We need more discussion time for this issue. There's not time for 1.3 to have that discussion. It's not a bad enough problem to block 1.3. So this should block 2.0.

sffc commented 1 year ago

Discuss with:

sffc commented 1 year ago

Decision from 2023-11-09:

See full notes in the 2023-Q3 Summit notes doc.

sffc commented 11 months ago

The consensus from 2023-11-09 was to use the name icu_canonicalizer, but the revivable crate is named icu_locale_canonicalizer: https://docs.rs/icu_locale_canonicalizer/latest/icu_locale_canonicalizer/

I remember that I did not really like this name from the start because it uses "locale" instead of "locid" as the rest of the crates use. I also don't really like making people write icu::locale_canonicalizer::LocaleDirectionality.

So I would like to make a new proposal for the names of these crates:

  1. Low-level crate: icu_fallback
  2. Higher-level crate: icu_locid_adapters

Note that we already have icu_provider_adapters so this builds on the same type of naming scheme.

The contents of the crates will be as discussed on 2023-11-09. I am not proposing any changes except for the names of the crates. The consensus there was:

  • locid_transform crate split: make it icu_locid_transform (for LocaleFallbacker, and future data-driven functionality relating to locales that all components need as a low-level dependency) and icu_canonicalizer (for LocaleCanonicalizer, LocaleExpander, and LocaleDirectionality, and future functionality that is not a universal component dependency)

OK?

Wants approval from:

robertbastian commented 11 months ago

icu_provider_adapters contains adapters that are providers and wrap providers. This is absolutely not the case in the high-level locale crate, it contains logic that works with locales, so I think icu_locid_adapters is a very bad name. We might actually need it in the future to provide adapters between icu_locid::Locale and other locale types in the Rust ecosystem.

We explicitly picked one of the names to be icu_locid_transform in order to not have another abandoned crate name. I still stand behind the original decision, even if icu_canonicalizer won't be a revive of the abandoned icu_locale_canonicalizer crate but a new name.

sffc commented 11 months ago

My agreement to the previous consensus was based on the understanding that icu_canonicalizer was the name of the previous crate. Given that it is actually icu_locale_canonicalizer, the consensus is not well-defined and therefore I withdraw my consensus vote for that naming scheme.

New proposal:

Wants approval from:

robertbastian commented 11 months ago

I'd rather use a shiny new name for the higher-level crate, but won't die on this hill.

sffc commented 11 months ago

I would love to hear a suggestion for a better name for the higher-level crate. Here are the only ones I've seen:

robertbastian commented 11 months ago

A whole new can of worms would be suggesting to rename icu_locid to icu_locale in 2.0. The crate's main types are Locale and LanguageIdentifier, there's no LocaleIdentifier that would shorten to locid. Then icu_locale_canonicalizer would be a good fit.

sffc commented 11 months ago

It's named icu_locid because in ICU4C there is locid.h, and because the name icu-locale was claimed in kebab case and we can't get it back in snake case.

I could live with icu_locid_info although I don't know if that's better than icu_locid_transform

robertbastian commented 10 months ago

I don't think ICU4C file names should have any bearing on our naming.

Note that the library inside the icu-locale crate could still be called icu_locale. At some point crates.io might also figure out that it can resolve icu_locale to icu-locale.

robertbastian commented 9 months ago

and because the name icu-locale was claimed in kebab case and we can't get it back in snake case.

I don't think this is a big problem. Most users will use the module through the meta crate. For the ones that don't, cargo add icu_locale will work. The library inside the icu-locale crate ~can~ will be called icu_locale anyway.

robertbastian commented 9 months ago

Another suggestion: Most users of a locale type crate will expect there to be canonicalization, minimization, and other locale information. There's a very small use case (basically icu_provider) that only needs a locale type, with nothing else. So:

sffc commented 9 months ago

Hmm. What do you think about icu_core? Everything can depend on it and it can include Locale and anything else we need to truly share across all components such as logging and documentation macros (#4467).

sffc commented 9 months ago

So the crates could be:

Dependency matrix:

Crate Dependencies
icu_core utils
icu_provider utils, icu_core
icu_fallback utils, icu_core, icu_provider
icu_locid, no default features utils, icu_core, icu_provider
icu_locid, compiled_data utils, icu_core, icu_provider, icu_fallback
icu_decimal, no default features utils, icu_core, icu_provider
icu_decimal, compiled_data utils, icu_core, icu_provider, icu_fallback

Does that look about right?

robertbastian commented 9 months ago

I guess icu_core is fine, although there might be a non-ICU use case for icu_locale_core that doesn't require ICU's logging macros. All ICU4X crates (except for locid) already depend on icu_provider, so that's kind of our core crate.

robertbastian commented 8 months ago

Proposed conclusion: icu_<something with locale>_core with Locale and friends, and icu_<something with locale> with data and funcionality. Do not necessarily need the same infix, see

LGTM: @robertbastian @manishearth @sffc @echeran

Possible names:

  1. icu_locale_core + icu_locid + icu::locid
  2. icu_locale_core + icu_locid + icu::locale
  3. icu_locale_core + icu-locale + icu::locale
  4. icu_locale_core + icu_locale + icu::locale (if possible)
  5. icu_locid_core + icu_locid + icu::locid

Voting:

Note: 4 > 5 >> everything else

Decision: if possible we do icu_locale provided crates.io lets us rename. If not, we add icu icu_locid_core and repurpose icu_locid.

Manishearth commented 8 months ago

Discussion on locid_transform/locid_fallback

voting:

No decision:

General points of contention:

Manishearth commented 8 months ago

We got icu_locale: https://crates.io/crates/icu_locale

(crates.io rules prevent renaming underscores to dashes, however crates.io rules sometimes allow empty squatted crates to be taken over, and icu-locale counts since it's empty and not used by anyone)

Note: 4 > 5 >> everything else

Decision: if possible we do icu_locale provided crates.io lets us rename. If not, we add icu icu_locid_core and repurpose icu_locid.

As a result, the decision on this is that we do option 4: icu_locale_core + icu_locale + icu::locale (if possible)

Manishearth commented 8 months ago

options are:

  1. icu_locale (displaynames, canonicalizer+) with another small crate A. icu_fallback with re-export. This is a "private", internal-facing crate. contains only fallback. B. icu_fallback without re-export. This is a "public" crate and the canonical location of the fallback machinery. C. Like A but could be named icu_compiled_data_utils pending bikeshed. could contain other things that compiled_data needs.
  2. icu_locale (fallbacker, canonicalizer+) + icu_displaynames (displaynames)
  3. icu_locale (displaynames, fallbacker, canonicalizer+) + nothing
  4. icu_locale (fallbacker, canonicalizer+, displaynames reexport) + icu_displaynames (displaynames, internal crate) -- eliminated because worse than 2

(canonicalizer+ = canonicalizer, directionality, "other things that don't need much data")


Manish draws some venn diagrams to help focus the discussion.


Discussion: The actual things people seem to care about are these:

  1. There should be a crate that is minimal for what is needed by compiled_data
  2. That crate will be reexported by icu_locale so it shouldn't contain anything we won't do from icu_locale
  3. There are likely to be things needed by compiled_data that are "not fallback", i.e. "default prefs data".

There was also discussion on instead a model where icu_displaynames is a separate crate, and icu_locale doesn't contain it, keeping it small (this helps maintain 1 since icu_locale stays mostly minimal). Robert and Manish don't like this as much since it's not fully minimal and it would be nice to have displaynames in icu_locale.

This leads us to:

icu_bikeshed (containing things needed by compiled data that also makes sense to reexport from icu_locale), icu_locale (reexport bikeshed, canonicalizer+, reexport locale_core, displaynames)

icu_bikeshed will start with fallbacker, and will likely contain default locale pref stuff if we add that.

If we end up with anything "needed by compiled data that should not be reexported from icu_locale", then we should make a new crate or something. This is a bridge we cross when we need to, we're not yet sure if there's anything that will belong here.

Agreed: @manishearth @robertbastian @sffc @echeran

We bikeshed icu_bikeshed later

Manishearth commented 8 months ago

Bikeshed suggestions for icu_bikeshed:

sffc commented 8 months ago
Manishearth commented 8 months ago

I like locale_ops, locale_data_utils, locale_support, and locale_mantle

Manishearth commented 8 months ago

Discusion with Zibi present

3 crates:

Discussion:

Proposals for icu_bikeshed:

  1. icu_locale_mantle
  2. icu_fallback
  3. icu_locale_fallback
  4. icu_locale_ops
  5. icu_locale_data_utils
  6. icu_locale_support

Discussion:

Open for voting

sffc commented 7 months ago

The first round of voting didn't reach a clear consensus. I'll send out another ballot with these options:

icu_locale_fallback icu_locale_ops icu_locale_support icu_locale_util (added late)

hsivonen commented 7 months ago

I think we should put display names in a crate separate from the others, because it's unlike the other data: It's for UI and large-ish.

sffc commented 7 months ago

ICU4X WG discussion from 2024-04-18:

Decision on how to move forward: convene a small group to finalize decision on the controversial topics and get alignment. This small group discussion will produce the final recommendation of the WG.

Locale crate splitting:

Note: The names of the crates resulting from the above discussion will be brought back to the main group.

sffc commented 7 months ago

2024-04-23 Special Meeting

https://github.com/unicode-org/icu4x/issues/3931

Four categories of functionality:

  1. Core Structs, including comparison, normalization
  2. Parent Locales, Partial Likely Subtags
  3. Aliases, Full Likely Subtags, Locale Directionality
  4. Display Names

Options

Option 1

One-stop shop:

icu::locale::Locale icu::locale::LocaleExpander icu::locale::LocaleCanonicalizer icu::locale::LocaleDirectionality icu::locale::LocaleDisplayNames

Requires child crates:

Note: this creates multiple ways to include the same types:

icu_locale_fallback::LocaleFallbacker icu_locale::LocaleFallbacker icu::locale::LocaleFallbacker

Option 2+

Different crates for different functionality

Included in metacrate as:

icu::locale::Locale icu::locale_fallback::LocaleFallbacker icu::locale_transform::LocaleCanonicalizer ...

Notes

Discussion on icu_locale_core naming:

Discussion on old crates:

Discussion on icu_locale_fallback crate name:

Proposal:

Introduce 3 crates:

LGTM: @echeran @robertbastian @sffc @zbraniecki @Manishearth

sffc commented 6 months ago

This can be done ahead of time by creating crates and re-exports.

robertbastian commented 6 months ago

After starting this I don't think it's worth doing for 1.5, verifying that everything stays semver compatible and introducing extra crates is a lot more work than just the few renames and moves that this needs.

robertbastian commented 2 months ago

Does the conclusion from #5120 apply to splitting out fallback as well?

sffc commented 2 months ago

Does the conclusion from #5120 apply to splitting out fallback as well?

I don't see how the conclusion in #5120 applies to splitting out fallback except that we don't need to proactively add the Cargo feature for it.

sffc commented 2 months ago

icu_locale is still this weird combination of both a component crate and a metacrate. Given that we plan to keep adding things to it, I do still think that icu_locale is likely get to a point where we will want to split out fallbacking in order to avoid compiling bloat in every component.

robertbastian commented 2 months ago

I do still think that icu_locale is likely get to a point where we will want to split out fallbacking in order to avoid compiling bloat in every component.

Yes, but it's currently not at this point, because nobody is asking for it. Hence this issue can be closed.

sffc commented 2 months ago

Conclusion:

LGTM: @sffc @robertbastian @Manishearth