tc39 / ecma402

Status, process, and documents for ECMA 402
https://tc39.es/ecma402/
Other
530 stars 104 forks source link

Whatever happened to Intl.MessageFormat? #92

Open hrimhari opened 8 years ago

hrimhari commented 8 years ago

Hi all,

Can you give an update regarding Intl.MessageFormat?

Has it been abandoned? I couldn't detect any activity later than 2014 (latest discussion seems to be from 2013) and it doesn't seem to appear in any future drafts.

Thank you in advance, Felipe Carasso

zbraniecki commented 8 years ago

There are two reasons we're not currently working on MessageFormat API:

1) The amount of work required to standardize it is humongous. 2) Some members of the community (me included) question if ICU MessageFormat is the right DSL/data-model to be standardized in JavaScript.*

For those reasons we decided to focus on lowering the entry barrier for client-side libraries designed to handle localization. We are standardizing building blocks such as language negotiation, plural forms etc.

Our hope is that once we get more libraries in the user land, we'll be able to see which ones gain more traction and revisit in the future.

littledan commented 7 years ago

A new localization format is under development at http://l20n.org/ which may be relevant.

aaronshaf commented 7 years ago

What other libraries / competing proposals are worth watching?

vanwagonet commented 7 years ago

There are also a bunch of user-space implementations that chose ICU MessageFormat:

mbrevda commented 7 years ago

These project also uses the ICU format:

https://github.com/globalizejs/react-globalize

glen-84 commented 5 years ago

Java and PHP both have built-in support for ICU MessageFormat (AFAIK). I quite like it.

Are there strong reasons why it may not be suitable?

ICU Message Format for Translators

zbraniecki commented 5 years ago

Are there strong reasons why it may not be suitable?

https://github.com/projectfluent/fluent/wiki/Fluent-and-ICU-MessageFormat

Fluent is currently in 0.9 RFC and we expect to have 1.0 by the end of this quarter. It becomes a strong contender for the base for MessageFormat 2.0.

The problem with standardizing a localization API is that its surface is insanely wider than anything else that we do with Intl. In fact, it's greater than almost any other API - you have to think of it more like WebAssembly than Intl.DateTimeFormat. Standardizing such an API with its DSL around a format that is known to have significant limitations and has a DSL that is designed with significantly different goals than the Web is (not human-readable etc.) would be a mistake. It would get us stuck for long term with support for an API that is likely going to be replaced with next generation solution within the next years.

So instead, I'm advocating to focus ECMA402 efforts on standardizing lower level components to make designing localization systems easier (Intl.PluralRules, Intl.Locale etc.) and wait for Unicode to have a chance to asses Fluent, before we make any decisions here.

As one of the co-authors of Fluent, I'm obviously biased, but I hope that you can find some rational argument in this logic. :)

littledan commented 5 years ago

I'd say, MessageFormat is a very important and high priority feature that we need to add in ECMA-402. At the same time, it's a very big feature, with many design considerations (including what the format should be like, how it should integrate into HTML, frameworks, working with translators, etc.). For that reason, we should do it after the lower-level components, but I definitely hope that we can come back around and standardize Intl.MessageFormat at some point in the future.

sffc commented 5 years ago

ICU MessageFormat is an industry standard, and as an ICU contributor, I have a certain affection for it. For Ecma 402, however, we need to put ICU MessageFormat up against other possible contenders, like Fluent and fbt, both of which are still maturing.

romulocintra commented 5 years ago

In my humble opinion, we should give a try to have something standard, from 5-8 years from now we are using different libraries that does the Job in different projects from different sizes, migrating legacy etc...

We struggle when it comes the time to choose the right tool for MessageFormat, normally we follow a set of checks :

  1. Analyse framework or tech stack(as is, to be)
  2. Check accepted formats by translations companies,
  3. If we have legacy formats find a way to reuse or transform

And a few more steps ...

I believe as a user that every site should have my language or help me to understand the content in some way, I see it as accessibility feature and having something that helps will make more "democratic" the understanding of the web. If we ease the life of developers to do it without having to use any 3party will be fantastic.

I can't imagine the "cost" of having it as Intl.API but definitely would have a big impact, nowadays we are pushing the use of web standards and the platform, having an opportunity like this will maximize that for small, medium or huge sized projects.

Fluent and fbt are very good candidates(Like some parts of one and some parts of the other) but at this point I cannot use in the majority of my projects for the reasons I told , translation companies are not compliant with the formats resulting in new adapters or transformers, or any other reason that could block that move.

I will be very happy to help in any move...design, interview some translation companies

glen-84 commented 5 years ago

Thanks for the replies.

Regarding Fluent:

  1. Are identifiers always required? Choosing identifiers for every piece of translated text in a web application might be a bit painful. I like using the source/default language in the HTML as the "keys" for translation, like:

    <span>{% tr 'Hello, {name}' with {name: 'Zibi'} %}</span>

    Here there's no identifier, and the string Hello, {name} is used as the key for translation.

  2. Is the INI/TOML-like storage format fixed? If you wanted to source translations from JSON/YAML, or a database, you would need to convert from that format? Would it not make sense for the storage format to be more abstract?

  3. After a fairly brief look, the syntax does seem a bit more complex than ICU MF – this is not an issue for us as developers, but I wonder how non-technical translators will cope with this? I guess with the right tools (syntax highlighting, dragging/dropping of placeables, etc.), this could be alleviated.

Anyway, thanks for the work that you do.

stasm commented 5 years ago

Hi @glen-84, thanks for your interest and questions about Fluent. I'm the leader of the project and I'll be happy to provide answers. You may be also interested in following Fluent's development on our Discourse.

  1. Yes, in Fluent, identifiers are always required. An identifier establishes a contract between the code and the translation; when the id changes, it signals that the translations need to be updated as well. On the other hand, small stylistic changes can be introduced into the English copy without the need to notify translation authors at all.

  2. What do you mean by fixed? The format is developed in the open at https://github.com/projectfluent/fluent and a number of implementations exist already, including JS, Python and Rust. It's possible to parse a Fluent file into an AST and represent it in other formats. Going the other way round is also possible, although the specifics depend on how the sourced translations are stored. If you have a specific use-case in mind, would you mind starting a new topic on our Discourse?

  3. Fluent encompasses a larger scope than MessageForfmat: it's a format for describing resources spanning multiple translations, including comments, as well as logic for referencing translations from within other translations. We've been using Fluent in Firefox for a year now and we're happy with how non-technical translators are able to work with it.

When we talk about Fluent, we necessarily mention the more complex features because that's what really makes it different from other solutions. So this is where Fluent might seem more complex as a whole. In reality, the vast majority of translations are usually simple strings without any logic to them. At the same time, it's the remaining few which make or break the UI in terms of using a natural-sounding language. Fluent tries to keep simple things simple while making the complex things—possible. As a point of reference, here's the preferences.ftl file used to localize Firefox's Preferences UI. It's a complex piece of UI and you'll note that some translations do look complex at first. But scroll down a bit to see a lot more of simple and hopefully readable syntax :)

As mentioned by other commenters, Fluent is still maturing. Throughout 2018 we've seen it used by developers and translators, and we've fixed papercuts and added features which increased the expressiveness and discoverability of the syntax. It's worked well for Mozilla so far, and I'd love to contribute what we've learned to the discussion about the future of MessageFormat.

zbraniecki commented 5 years ago

I will be very happy to help in any move...design, interview some translation companies

I doubt we can do much yet. I'd say we should wait to see how fluent and fbt test against the market over the year or so before we proceed here. If you want to, we could start by collecting data on things like data model differences between MessageFormat, Fluent and fbt, capabilities, API fit etc. Such analysis would allow us to better understand what decisions are we going to make in the future.

1. Are identifiers always required? Choosing identifiers for every piece of translated text in a web application might be a bit painful.

Here's Fluent's take on source-string-as-id.

glen-84 commented 5 years ago

Hi @stasm,

  1. I was just imagining having translation data stored in a regular (relational) database, and then loading that into the language-specific representation through some form of abstract adapter.

    So:

    Custom storage (RDBMS/YAML/etc.) -> Language abstraction/adapter

    Instead of:

    Custom storage (RDBMS/YAML/etc.) -> Fluent text-based format -> Language abtraction

    ... but it's not a big deal either way, I was just curious. I suppose that it's better to have a common storage format that can be shared/transferred.


Regarding preferences.ftl:

  1. It seems a little bit hacky to include CSS like that (L28), I think that it should be limited to translatable strings only. Perhaps there could be some kind of syntax extension to provide UI "hints" along with a string.

    Made up syntax (I didn't spend much time thinking about it):

    search-input-box =
        .placeholder = Find in Options
        ($width: 15.4em, $height: 10px, $showIcon: false)
  2. I also feel a bit uneasy about HTML in translation strings, as it might cause misunderstandings. So instead of:

    Search term: <span data-l10n-name="query"></span>

    Maybe use a variable like:

    Search term: { $query }

(Feel free to ignore these thoughts/ideas, I'm just thinking out loud.)

@zbraniecki,

Thanks for that link. I was aware of the first issue, but the second point makes sense as well.


Good luck guys. 🙂

zbraniecki commented 5 years ago
2\. I also feel a bit uneasy about HTML in translation strings, as it might cause misunderstandings. So instead of:

That feature (called DOMOverlays) has been really well received by our developers and localizers. You can read more about the current state in https://github.com/projectfluent/fluent.js/wiki/DOM-Overlays

We're also planning to extend the functionality in the 3rd revision. See the POC implementation and the list of features here: https://github.com/zbraniecki/fluent-domoverlays-js/issues/1#issuecomment-459159959

glen-84 commented 5 years ago

Interesting – thanks for the links.

For the anchor example, my variation would be:

<span data-l10n-id="privacy-note"></span>
privacy-note = Read our { $privacyPolicyLink }.
privacy-link-text = privacy policy
  .title = Privacy Policy
bundle.format(
    privacyNote,
    {
        privacyPolicyLink:
            '<a data-l10n-id="privacy-link-text" href="https://www.mozilla.org/privacy" />'
    }
);

In that way, the translator works only with plain text, and it's the developer's responsibility to provide the HTML parts. However, it might not be practical/possible. It's just a thought.

romulocintra commented 5 years ago

@sffc can we add this to the next meeting?

Recently there are APIs incorporated to the browser that could help to provide a standard API for localized messages.

IMHO the best candidates to review

sffc commented 5 years ago

@romulocintra In order to be most productive discussing this in the ECMA 402 meeting, it would help if someone could make slides or do some other legwork ahead of the meeting to help the committee have a more informed discussion. Is this something you or someone else on this thread would be interested in putting together?

EDIT: By "slides", I mean slides to compare and contrast the four options mentioned above.

romulocintra commented 5 years ago

@romulocintra In order to be most productive discussing this in the ECMA 402 meeting, it would help if someone could make slides or do some other legwork ahead of the meeting to help the committee have a more informed discussion. Is this something you or someone else on this thread would be interested in putting together?

EDIT: By "slides", I mean slides to compare and contrast the four options mentioned above.

I can put together something related to what @zbraniecki said:

, we could start by collecting data on things like data model differences between MessageFormat, Fluent and fbt, capabilities, API fit etc. Such analysis would allow us to better understand what decisions are we going to make in the future.

(+) having some benchmarks about usage, API , DX etc..

sffc commented 5 years ago

As we start diving deep into the design here, we should articulate clearly what are the advantages of putting Intl.MessageFormat into the spec, rather than leaving it to user land.

For example, TC39 typically hasn't picked favorites among templating languages (JSX, Jade, Handlebars, Mustache, etc.) or MVC frameworks (React, Angular, Vue, etc.). Typically, ECMA 402 APIs have two advantages: (1) reduce data and library payload, and (2) encourage best practices. We will have to make a case that for Intl.MessageFormat, it is advantageous for the spec to dictate the official recommended message syntax for JavaScript.

ray007 commented 5 years ago

For example, TC39 typically hasn't picked favorites among templating languages (JSX, Jade, Handlebars, Mustache, etc.)

JS now does have template strings.

ljharb commented 5 years ago

Those are just interpolation literals tho - "template" is an unfortunate name, since it's not actually templating.

longlho commented 5 years ago

Disclaimer: I maintain react-intl and its underlying libraries (intl-messageformat and that monorepo).

I agree that Intl.MessageFormat might be better in the user land for now while primitive APIs are being figured out.

  1. The surface is extremely huge and the current ICU syntax still has a bunch of things to figure out (e.g very limited support for relative time, and no guidelines on formatting list)
  2. In terms of data & library payload, from what I've seen it's still CLDR data, which can be surfaced via a more straightforward API since it's structured data. In terms of library payload the grammar parser itself is fairly small (across our implementation and other similar ICU parsers).
  3. In terms of best practices, having an official message format (assuming the feature set & pattern syntax is figured out) does encourage devs to declare a more natural message that helps facilitate the translation process. This, however, are already partially addressed in ICU user guide.
  4. The real bottleneck of the i18n pipeline (from what I've seen at Dropbox & Yahoo) is typically the translation vendors (e.g some cannot deal w/ complicated nested select, or event simple selectordinal) so a lot of toolchains were built to restrict the pattern usage within the message itself, so having the official "API" doesn't necessarily facilitate that.
romulocintra commented 5 years ago

Hi @longlho

We are organizing a new Working group for "Intl.messageFormat", the initial goal is to gather requirements and to define a roadmap. During July I will organize several 1-1 Meetings to hear about your technical challenges and to collect requirements, before scheduling more regular meetings.

If any of you are interested,please schedule here

jamuhl commented 5 years ago

Disclaimer: I maintain i18next and most of the UI framework supporting react-i18next, jquery-i18next, ...

I might share some insights from my point of view plus give some feedback

@romulocintra

You mention i18next as a candidate to watch. The chosen i18n format in i18next is rather limited - so I would rather go for fluent or ICU. At i18next we already enable those two by using i18next-fluent or i18next-icu ---> i18next is about flexibility and tooling (eg. loading translations, ...)

For some time I know thought about the next version of i18next - this, on one hand, includes using the Intl primitives but also going for a more robust format currently ICU or fluent - but with the current uncertainty what the decision here is - I better do nothing before picking the one that gets not adopted by browsers.

Which one to pick? fluent, ICU or ftb - that will be a hard decision but waiting for another 5 years and the list of options will just have grown.

@longlho

Beside i18next we also have a commercial translation management tool https://locize.com

The real bottleneck of the i18n pipeline (from what I've seen at Dropbox & Yahoo) is typically the translation vendors (e.g some cannot deal w/ complicated nested select, or event simple selectordinal) so a lot of toolchains were built to restrict the pattern usage within the message itself, so having the official "API" doesn't necessarily facilitate that.

If there is a chance that one format gets the defacto standard for web (be it ICU, be it fluent) my bet is the tooling will improve. Currently, as a vendor, you just have too many formats for web you need to support (and the web is only one piece of what you need to support)

romulocintra commented 5 years ago

Hi

Hi @longlho

We are organizing a new Working group for "Intl.messageFormat", the initial goal is to gather requirements and to define a roadmap. During July I will organize several 1-1 Meetings to hear about your technical challenges and to collect requirements, before scheduling more regular meetings.

If any of you are interested, please schedule here

@jamuhl can you schedule 1-1 Meeting and we can talk about it 👍

sffc commented 5 years ago

Also, consider joining the Message Format Working Group Google Group.

littledan commented 4 years ago

You can follow up with continuing work in https://github.com/unicode-org/message-format-wg

sffc commented 2 years ago

The proposal has reached Stage 1.

https://github.com/tc39/proposal-intl-messageformat

ryzokuken commented 2 years ago

Can this be closed?

sffc commented 2 years ago

We normally keep issues open until the corresponding proposal lands in the ECMA-402 spec (Stage 4).

ryzokuken commented 2 years ago

We normally keep issues open until the corresponding proposal lands in the ECMA-402 spec (Stage 4).

I understand, but I thought of this as more of an informative issue asking about the status of the proposal than an issue making the proposal itself, and that it's resolved since you already informed the OP about the current status.

ljharb commented 2 years ago

(fwiw, on 262, an issue would absolutely be closed once a proposal repo existed, since that becomes the primary appropriate place to discuss it)

sffc commented 2 years ago

As far as I can tell, this issue is the canonical tracking issue for the Intl.MessageFormat proposal. Further technical discussion should happen on the proposal repo, but we should still keep the original issue open until it is fixed, for tracking purposes. The issue is not fixed until the proposal is merged.

ljharb commented 2 years ago

Does 402 typically keep tracking issues for each proposal? 262 certainly does not keep any - that's what the proposals repo is for.

sffc commented 2 years ago

Yes we do. The following open issues track proposals that are Stage 1 to 3.

https://github.com/tc39/ecma402/issues?q=is%3Aopen+label%3AProposal+label%3A%22s%3A+in+progress%22

ljharb commented 2 years ago

Fair enough, thanks.