Open sffc opened 1 year ago
We need more discussion time for this issue. There's not time for 1.3 to have that discussion. It's not a bad enough problem to block 1.3. So this should block 2.0.
Discuss with:
Decision from 2023-11-09:
See full notes in the 2023-Q3 Summit notes doc.
The consensus from 2023-11-09 was to use the name icu_canonicalizer
, but the revivable crate is named icu_locale_canonicalizer
: https://docs.rs/icu_locale_canonicalizer/latest/icu_locale_canonicalizer/
I remember that I did not really like this name from the start because it uses "locale" instead of "locid" as the rest of the crates use. I also don't really like making people write icu::locale_canonicalizer::LocaleDirectionality
.
So I would like to make a new proposal for the names of these crates:
icu_fallback
icu_locid_adapters
Note that we already have icu_provider_adapters
so this builds on the same type of naming scheme.
The contents of the crates will be as discussed on 2023-11-09. I am not proposing any changes except for the names of the crates. The consensus there was:
- locid_transform crate split: make it icu_locid_transform (for LocaleFallbacker, and future data-driven functionality relating to locales that all components need as a low-level dependency) and icu_canonicalizer (for LocaleCanonicalizer, LocaleExpander, and LocaleDirectionality, and future functionality that is not a universal component dependency)
OK?
Wants approval from:
icu_provider_adapters
contains adapters that are providers and wrap providers. This is absolutely not the case in the high-level locale crate, it contains logic that works with locales, so I think icu_locid_adapters
is a very bad name. We might actually need it in the future to provide adapters between icu_locid::Locale
and other locale types in the Rust ecosystem.
We explicitly picked one of the names to be icu_locid_transform
in order to not have another abandoned crate name. I still stand behind the original decision, even if icu_canonicalizer
won't be a revive of the abandoned icu_locale_canonicalizer
crate but a new name.
My agreement to the previous consensus was based on the understanding that icu_canonicalizer
was the name of the previous crate. Given that it is actually icu_locale_canonicalizer
, the consensus is not well-defined and therefore I withdraw my consensus vote for that naming scheme.
New proposal:
icu_fallback
for the low-level crateicu_locid_transform
for the higher-level crateWants approval from:
I'd rather use a shiny new name for the higher-level crate, but won't die on this hill.
I would love to hear a suggestion for a better name for the higher-level crate. Here are the only ones I've seen:
icu_locid_utils
icu_canonicalizer
icu_locid_info
icu_locid_meta[data]
A whole new can of worms would be suggesting to rename icu_locid
to icu_locale
in 2.0. The crate's main types are Locale
and LanguageIdentifier
, there's no LocaleIdentifier
that would shorten to locid
. Then icu_locale_canonicalizer
would be a good fit.
It's named icu_locid
because in ICU4C there is locid.h
, and because the name icu-locale
was claimed in kebab case and we can't get it back in snake case.
I could live with icu_locid_info
although I don't know if that's better than icu_locid_transform
I don't think ICU4C file names should have any bearing on our naming.
Note that the library inside the icu-locale
crate could still be called icu_locale
. At some point crates.io might also figure out that it can resolve icu_locale
to icu-locale
.
and because the name icu-locale was claimed in kebab case and we can't get it back in snake case.
I don't think this is a big problem. Most users will use the module through the meta crate. For the ones that don't, cargo add icu_locale
will work. The library inside the icu-locale
crate ~can~ will be called icu_locale
anyway.
Another suggestion: Most users of a locale type crate will expect there to be canonicalization, minimization, and other locale information. There's a very small use case (basically icu_provider
) that only needs a locale type, with nothing else. So:
icu_locale_core
: current icu_locid
. icu_provider
depends on this. Not a module in icu
icu_locale
: reexports types from icu_locale_core
, and adds everything that's currently in icu_locid_transform
. Is a module in icu
.Hmm. What do you think about icu_core
? Everything can depend on it and it can include Locale
and anything else we need to truly share across all components such as logging and documentation macros (#4467).
So the crates could be:
icu_core
= contains Locale, LanguageIdentifier, and internal macros for things such as logging and documentation. We will try to keep it as small as possible.icu_locid
OR icu_locale
= re-exports Locale/LanguageIdentifier and includes LocaleExpander, LocaleCanonicalizer, LocaleDirectionality, etc.icu_fallback
OR icu_locid_transform
= fallback machinery.Dependency matrix:
Crate | Dependencies |
---|---|
icu_core | utils |
icu_provider | utils, icu_core |
icu_fallback | utils, icu_core, icu_provider |
icu_locid, no default features | utils, icu_core, icu_provider |
icu_locid, compiled_data |
utils, icu_core, icu_provider, icu_fallback |
icu_decimal, no default features | utils, icu_core, icu_provider |
icu_decimal, compiled_data |
utils, icu_core, icu_provider, icu_fallback |
Does that look about right?
I guess icu_core
is fine, although there might be a non-ICU use case for icu_locale_core
that doesn't require ICU's logging macros. All ICU4X crates (except for locid) already depend on icu_provider
, so that's kind of our core crate.
icu_core
.icu_provider::_internal::logging
icu_datetime
icu_utils
and having it not be on the 1.0 trackeprintln!
icu_core
unless it is really core to internationalization (universally used across all crates). icu_locale_core
sounds goodinclude!
?include!
is bad for publishing. We can copy the files around.Proposed conclusion: icu_<something with locale>_core
with Locale and friends, and icu_<something with locale>
with data and funcionality. Do not necessarily need the same infix, see
LGTM: @robertbastian @manishearth @sffc @echeran
Possible names:
icu_locale_core
+ icu_locid
+ icu::locid
icu_locale_core
+ icu_locid
+ icu::locale
icu_locale_core
+ icu-locale
+ icu::locale
icu_locale_core
+ icu_locale
+ icu::locale
(if possible)icu_locid_core
+ icu_locid
+ icu::locid
Voting:
Note: 4 > 5 >> everything else
Decision: if possible we do icu_locale
provided crates.io lets us rename. If not, we add icu
icu_locid_core
and repurpose icu_locid
.
Discussion on locid_transform/locid_fallback
icu_locid_transform
. Re-export the stuff from icu_loc[ale/id]
that it currently re-exportsicu_locid_transform
and remove everything except the limited bit of data-using code that is needed for the compiled_data
feature (right now, the langid fallbacker)icu_locid_transform
and remove everything except the language identifier fallbacker (from @robertbastian). This is 4 but it reduces churn, this means locid_transform
is really "just icu_fallback
with a terrible name"icu_fallback
with the contract being locale fallback onlyicu_compiled_data_utils
(or similar name) that includes fallback, and re-export LocaleFallbacker
from icu_locid
icu_locid
in icu_locid_transform
, eventually we'll have to rip off the bandage so let's just do it. There will be a bit of migration work in 2.0 anywayvoting:
No decision:
General points of contention:
locid_transform
name and would like to see it go away.We got icu_locale
: https://crates.io/crates/icu_locale
(crates.io rules prevent renaming underscores to dashes, however crates.io rules sometimes allow empty squatted crates to be taken over, and icu-locale
counts since it's empty and not used by anyone)
Note: 4 > 5 >> everything else
Decision: if possible we do icu_locale provided crates.io lets us rename. If not, we add icu icu_locid_core and repurpose icu_locid.
As a result, the decision on this is that we do option 4: icu_locale_core
+ icu_locale
+ icu::locale
(if possible)
icu_locale
crate will contain, as discussed, all locale-related, data-driven code. This includes:
options are:
(canonicalizer+ = canonicalizer, directionality, "other things that don't need much data")
Manish draws some venn diagrams to help focus the discussion.
Discussion: The actual things people seem to care about are these:
There was also discussion on instead a model where icu_displaynames
is a separate crate, and icu_locale
doesn't contain it, keeping it small (this helps maintain 1 since icu_locale stays mostly minimal). Robert and Manish don't like this as much since it's not fully minimal and it would be nice to have displaynames in icu_locale
.
This leads us to:
icu_bikeshed
(containing things needed by compiled data that also makes sense to reexport from icu_locale
), icu_locale
(reexport bikeshed, canonicalizer+, reexport locale_core, displaynames)
icu_bikeshed
will start with fallbacker, and will likely contain default locale pref stuff if we add that.
If we end up with anything "needed by compiled data that should not be reexported from icu_locale", then we should make a new crate or something. This is a bridge we cross when we need to, we're not yet sure if there's anything that will belong here.
Agreed: @manishearth @robertbastian @sffc @echeran
We bikeshed icu_bikeshed
later
Bikeshed suggestions for icu_bikeshed
:
icu_compiled_data_utils
icu_locale_mantle
("not core, but still somewhat core")icu_locale_core2
icu_fallback
icu_locale_fallback
icu_locale_ops
icu_locale_data_utils
icu_locale_support
I like locale_ops, locale_data_utils, locale_support, and locale_mantle
Discusion with Zibi present
3 crates:
icu_locale_core
: structures (currently icu_locid
)icu_bikeshed
: fallbacking and other locale-related machinery for compiled dataicu_locale
: re-exports icu_locale_core
, icu_bikeshed
, and adds data-driven functionality (similar to icu_locid_transform
)Discussion:
Locale
don't need the data. We shouldn't make them either carry all the data or hunt around for icu_locale_core
. My slight preference would be to keep icu_locale
with the structs only.Locale
probably want access to canonicalization, display names, etc.Proposals for icu_bikeshed
:
Discussion:
icu_locale_mantle
is too "cute"icu_locale_mantle
hinders the ability of users to grok the crate. But it is nice once people figure out what it means.icu_locale_fallback
seems good because it won't expand beyond fallback. The contract is about retrieving a value that we need from the locale but the locale doesn't contain that value. It also covers my expectation that fallback itself may grow in ways that contain weights, etc.icu_locale
, it gets very messy.Open for voting
The first round of voting didn't reach a clear consensus. I'll send out another ballot with these options:
icu_locale_fallback icu_locale_ops icu_locale_support icu_locale_util (added late)
I think we should put display names in a crate separate from the others, because it's unlike the other data: It's for UI and large-ish.
ICU4X WG discussion from 2024-04-18:
_util
doesn't describe what anything does; it means we don't know what it does and couldn't come up with a better nameDecision on how to move forward: convene a small group to finalize decision on the controversial topics and get alignment. This small group discussion will produce the final recommendation of the WG.
Locale crate splitting:
Note: The names of the crates resulting from the above discussion will be brought back to the main group.
https://github.com/unicode-org/icu4x/issues/3931
Four categories of functionality:
One-stop shop:
icu::locale::Locale
icu::locale::LocaleExpander
icu::locale::LocaleCanonicalizer
icu::locale::LocaleDirectionality
icu::locale::LocaleDisplayNames
Requires child crates:
icu_locale_core
for core structs: required because of circular dependency with icu_provider
icu_locale_fallback
used for compiled dataicu_locale_transform
, icu_locale_names
Note: this creates multiple ways to include the same types:
icu_locale_fallback::LocaleFallbacker
icu_locale::LocaleFallbacker
icu::locale::LocaleFallbacker
Different crates for different functionality
icu_locale
for core structsicu_locale_fallback
, icu_locale_transform
, icu_locale_names
, ...Included in metacrate as:
icu::locale::Locale
icu::locale_fallback::LocaleFallbacker
icu::locale_transform::LocaleCanonicalizer
...
icu_locale
as lightweight/small, 90% of users want core structs, to this is a metacrate, if you don't know what you're doing, come here and you can tailor it.Discussion on icu_locale_core
naming:
icu_locale_core
, should we consider icu_locid
? I still think there's a disproportionally high value for a crate that just gives them structs. icu_locale_core
sounds to me like an internal crate introduced predominently to resolve the circular dependency. But I think it is a standalone crate, popular relative to the whole of ICU. Every HTTP, locale mangement, language handling crate will want to use this. We give the nicer name to the metacrate, and icu_locid
is for the structs.icu_locid
is not a real name; there is nothing called "locid". Maybe icu_locale_types
?icu_locale_types
seems not much better than icu_locale_core
_core
. It can mean other things, similar to _traits
crates. I agree that locid
is just confusing. It seems that if we expect this to be a big popular crate, we shouldn't restrict ourselves to the ICU4C name.icu_locid
. The transfer of knowledge from ICU4C and the 4 years of precedent in Rust should not be discounted.icu_locale_core
because, (1), it's a common convention when having a multi-module artifact to have a "core", including in Java. (2) Visual consistency of icu_locale
and icu_locale_something
that reflects parent/meta crate to child crate relationship. (3) "locid" is short, and is the ICU4C identifier, but without background knowledge, I just want to see "locale".locid
, we are accessible to ICU4C developers, but not new developers. With locale
, we're accessible to developers from any background.icu_locid
. I see _core
as being similar to _raw
, an internal-facing thing. Should we consider icu_locale_id
?_core
implies what we're going for here. The signal for what it means is pretty strongly correlated to what we're trying to do here._core
gets used for many things in the Rust world.rand
in Rust.Discussion on old crates:
Discussion on icu_locale_fallback
crate name:
Proposal:
Introduce 3 crates:
icu_locale_core
icu_locid
icu_locale_fallback
icu_locid_transform
icu_locale
icu_locale_core
and icu_locale_fallback
, and also includes other locale-related functionality, which could be split to additional child crates if the need arisesLGTM: @echeran @robertbastian @sffc @zbraniecki @Manishearth
This can be done ahead of time by creating crates and re-exports.
After starting this I don't think it's worth doing for 1.5, verifying that everything stays semver compatible and introducing extra crates is a lot more work than just the few renames and moves that this needs.
Does the conclusion from #5120 apply to splitting out fallback as well?
Does the conclusion from #5120 apply to splitting out fallback as well?
I don't see how the conclusion in #5120 applies to splitting out fallback except that we don't need to proactively add the Cargo feature for it.
icu_locale
is still this weird combination of both a component crate and a metacrate. Given that we plan to keep adding things to it, I do still think that icu_locale
is likely get to a point where we will want to split out fallbacking in order to avoid compiling bloat in every component.
I do still think that icu_locale is likely get to a point where we will want to split out fallbacking in order to avoid compiling bloat in every component.
Yes, but it's currently not at this point, because nobody is asking for it. Hence this issue can be closed.
Conclusion:
icu_locale_fallback
split still on the table, but not high priorityLGTM: @sffc @robertbastian @Manishearth
There is too much going on in icu_locid_transform for it to be a core crate. It should only contain the things essential for fallback.