ICU4X - Githubissues

sffc commented 3 years ago

Hi rust-unic folks,

I just wanted to make you all aware that the Unicode Consortium is sponsoring a project, ICU4X, which will also provide certain i18n functionality in Rust. It is under active development with contributors from Google and Mozilla. Read more here:

https://github.com/unicode-org/icu4x

The focus of ICU4X is more on user-level functionality like date/time/number formatting, plural rules, etc. However, we're also hoping to expand into things like segmentation and Unicode properties, in order to provide an ECMA-402-like surface in Rust.

If you are interested in contributing, see:

https://github.com/unicode-org/icu4x/blob/master/CONTRIBUTING.md

behnam commented 3 years ago

Thanks for mentioning and advertising icu4x here, Shane. I have been following the work sporadically, since our meeting back in Oct 2019.

I'd like to reiterate some of my points from the meeting:

UNIC's focus, at least at this stage of development, is to provide low-level "multilingual text processing" capabilities, for use in higher-level i18n API, or directly by applications.
As a result of (1), UNIC doesn't have yet any focus on providing locale-specific functionalities (most of the CLDR-based APIs, in short).

Now, although I'm happy to see that icu4x is resolving the needs of some projects/organizations, as far I understand there's no effort in changing the status quo of multilingual text processing, and the goal is to follow ICU and ECMA-402 in the API design, which are both limited in that aspect.

I would be happy to discuss this more, specifically our the Euro-centric design of ICU / ECMA-402 APIs and their approach to application development. Last year at the meeting at IUC 43, it seemed like there's no interest in that conversation and all organizations involved just wanted to focus on their own goal of making single-language (or European-language-friendly) APIs, to move on to their other concerns.

Those said, there seem to be almost no overlap between the Open-i18n agenda and ICU/icu4x agenda on my side. And, I believe the team have made that very clear by choosing to take all the conversation to https://github.com/i18n-concept.

So, there seem to be nothing actionable here, as far as I can tell. So I'm closing this "issue". Please feel free to add details, if I got anything not correctly, or if there's been any change in the agenda and goals of icu4x project. Thanks.

sffc commented 3 years ago

CC @zbraniecki @manishearth @hsivonen @nciric @echeran @mihnita @filmil @kpozin

Thanks @behnam for your response.

You're correct in saying that at the current time, most "multilingual text processing" functions, such as UTS 39, UTS 46, etc., are out of scope for ICU4X. In large part, ICU4X is focusing right now only on UTS 35 (CLDR).

I do believe there is room for collaboration, though. Given our relationship with the Unicode Consortium, we plan for one of the core features of ICU4X to be ways of exposing Unicode Properties. Text processing crates can depend on ICU4X to get the freshest Unicode data, rather than the current status quo of a manually updated data blob.

So, there seem to be nothing actionable here, as far as I can tell.

Given that UNIC and ICU4X will help cover different pieces of ICU in Rust, I hope that we can cross-link to each other's projects in our respective README files to help developers find what they need.

In particular, I find it misleading that the README in this repo says that "the goal of UNIC is to provide ... locale-based processed based on [CLDR]". Based on your comment, it sounds like you consider CLDR out of scope for UNIC. If this is true, I think it would be useful for developers to point people in the right direction, or at least update the text if you no longer consider this to be in scope.

zbraniecki commented 3 years ago

@behnam thank you for your feedback. I don't think what you describe as "euro-centric" design is there by design. I think it is an artifact of the expertise and contributions we received so far and I believe you could help us improve our design to be more global (which is our intention).

as far I understand there's no effort in changing the status quo of multilingual text processing, and the goal is to follow ICU and ECMA-402 in the API design, which are both limited in that aspect.

While this is true that we are basing our API on ICU and ECMA402, we are open to drift away in cases where we have a reason to believe that those two API designs are limiting or plain wrong. What we would love to receive is support in avoiding the same pitfalls. Do you think you may be able to review our upcoming 0.1 release and give us feedback on the API design we have so far?

Those said, there seem to be almost no overlap between the Open-i18n agenda and ICU/icu4x agenda on my side. And, I believe the team have made that very clear by choosing to take all the conversation to https://github.com/i18n-concept.

I don't think that was an intentional move to distance it from open-i18n. The plane of internationalization in Rust is in a storming phase of group cycle (sociology hat on!) and its hard to understand the ownership and scopes yet.

Please feel free to add details, if I got anything not correctly, or if there's been any change in the agenda and goals of icu4x project. Thanks.

I don't think you were incorrect in any of the descriptions of reality, but I feel like there are implicit assumptions about our goals that you are attributing to us. We do intend to primarily focus on ECMA402 scope of APIs first, since this is a subset that seems to be the most requested and needed to provide a good foundation for software internationalization. We are comfortable going beyond of what ECMA402 supports since our target is different (you can see that in unicode set work, and more detailed API pieces for Locale and PluralRules than ECMA402 carries). I can see how UNIC may remain separate and focused on the scope of internationalized text processing and maybe ICU4X DataProvider can be of help in maintaining data backend for it (providing good Unicode data in stable form over time), but I'd like our projects to cooperate and in particular, I'd like to find help in addressing your concern about ICU4X being regionally centric.

Please, let me know if you think you may be available and interested in helping us with that!

open-i18n / rust-unic

ICU4X #274