tc39 / ecma402

Status, process, and documents for ECMA 402
https://tc39.es/ecma402/
Other
534 stars 104 forks source link

Data-Driven API #210

Open sffc opened 6 years ago

sffc commented 6 years ago

It can be said that the challenge of providing i18n services can be split into two concepts:

  1. Data: One needs access to a database of locale data.
  2. Logic: Once the data is provided, there needs to be a way to process it.

In the i18n world, as well as in software in general, people like to be able to design their own logic. There are already dozens of wrappers over Ecma 402. It is not hard to find examples of clients who reverse-engineer i18n libraries to "extract" the data out of them; I can provide some examples.

Right now, the Ecma 402 APIs are all "logic" APIs. I suggest that we consider breaking the APIs into the two concepts: data and logic. The existing APIs need not change; I suggest simply adding a new data API, and redefining the spec for the logic functions to be in terms of the data. The data format can be defined by the Unicode specification UTS 35, which is supported by another standards body.

The advantages of doing this include:

The API can be as simple as something like Intl.Data.getNumberPattern(locale) or Intl.Data.getDateTimePattern(locale, skeleton). The methods can return a promise or take a callback to allow the user to make an asynchronous pop-in replacement.

rxaviers commented 6 years ago

The theory sounds good, but the practical benefits aren't clear to me. Do you suggest to expose all CLDR data through this API or a subset? If a subset, which one? Could you cite examples/use cases where this is useful please?

rxaviers commented 6 years ago

Clarification: I can see value in exposing some data, such as display names. My confusion is basically the scope.

caridy commented 6 years ago

The real problem here is backward compatibility. I don't think backward compatibility (forever) is in the charter of UTS 35 or any other i18n data provider, while that is in the DNA of Javascript and the Web. Instead, we are aiming for a set of low-level APIs that can help you to build abstractions that rely on that data that you mentioned, but without exposing the data directly. Yes, it is more complicated, it is less flexible, but it has two very nice effect:

sffc commented 6 years ago

CLDR has a lot of data, and it often has messy fallback rules. I was thinking that our API would be "CLDR++", where we only expose a subset of data useful for JavaScript users and take care of locale fallbacks and other intricacies of CLDR data loading under the hood. And of course if you wanted to use a data source that isn't CLDR, you're welcome to do so as long as you expose the same API.

For stability, if UTS 35 doesn't suffice, I don't see anything necessarily wrong with re-specifying the format of the subset of UTS 35 data that we provide through Ecma 402.

msaboff commented 5 years ago

unadjustednonraw_thumb_86cd unadjustednonraw_thumb_86ce unadjustednonraw_thumb_86cf unadjustednonraw_thumb_86d0 unadjustednonraw_thumb_86d1 unadjustednonraw_thumb_86d2 unadjustednonraw_thumb_86d3

sffc commented 5 years ago

@indexzero

indexzero commented 5 years ago

Thanks for including me @sffc – would love to help get involved on this issue.

I will admit that I am coming at this from a pragmatic point of view:

The intl-{message,relative}format libraries are ponyfills that state their intention to remain up-to-date with ECMA-402 along with some additional features. Whether or not those additional features are good or bad features they illustrate the value of exposing the data in a more granular fashion. That is, there will inevitably be features built on top of Intl APIs that need to access data not currently available.

By empowering that goal we make i18n easier for applications and developers. I have seen an enormous amount of time spent bikeshedding on the most optimal way to deliver CLDR data into browsers to initialize react-intl. It would be interesting to hear from other ecosystem projects which may have similar concerns.

In what ways these ecosystem libraries will need data access remains a question for me. The data access by react-intl and its dependencies is sparse for certain edge cases, yet the library forces consumers to provide all of the CLDR data.

Perhaps reaching out to some of the folks who maintain these libraries is a good next step? Forgive me if you folks have / are already chatting with them.

sffc commented 5 years ago

Some more ideas I had.

There are cases where the user wants to provide their own data but use the browser's built-in logic, and vice-versa. If we can define a stable data language, similar to what's provided by LDML, then we can decouple that in JavaScript.

Here's an example of how a programmer could use their own data with the browser's algorithm. They give their data provider to a factory that asynchronously constructs an Intl.NumberFormat using that data provider instead of the browser's default data provider:

const dataProvider = // (user-land object implementing a data provider interface)
const factory = new Intl.Data.Factory(provider);
const fmt = await factory.createNumberFormat("ml", { style: "percent" });

The data provider interface could be as simple as: async get(localeList, xpath) returns the data at the specified xpath and the best matching locale. We would define the space of valid xpaths, which could be similar to LDML. The browser could expose this API:

const { locale, data } = await Intl.Data.defaultProvider.get(
    ["ff", "ar"], "/numbers/decimalFormats@numberSystem=latn/pattern");

If the user wants to provide their own data only when the browser doesn't have the data for that locale, they could write something along the lines of,

class MyDataProvider {
  async get(localeList, xpath) {
    const browserResult = await Intl.Data.defaultProvider.get(localeList, xpath);
    const requested = (typeof localeList === "string") ? localeList : localeList[0];
    if (browserResult.locale !== requested) {
      // call custom data service and return that result
    } else {
      return browserResult;
    }
  }
}
longlho commented 4 years ago

Thanks @sffc for redirecting me here. Since @indexzero mentioned react-intl that I happen to maintain (& Dropbox also happen to use as well) I'd like to provide some context here:

I think at a high level what could help the workflow above is:

sffc commented 4 years ago

See #87 for some discussion on your first bullet ("locale negotiation").

sffc commented 4 years ago

My feelings on this issue are going back and forth.

On the one hand, it is nice to give app developers the power to add more data when the browser provides insufficient feature or locale coverage. On the other hand, the design of Intl is for it to be "best-effort" and easy to use (hard to abuse), and this thread has raised several good points that injecting data into Intl at runtime adds a significant amount of complexity.

I know that Chrome is working long-term on dynamically adding data for new locales. I think Firefox has a similar effort. By keeping the data exchange in the browser engine, Intl's handling of CLDR data remains transparent to the user, which seems like a desirable property.

ljharb commented 4 years ago

Without the ability to object the data, polyfilling new data requires replacing almost every single Intl method; with that ability, all the methods may be correct already and just need new backing data.

sffc commented 4 years ago

Is it possible to have a function detect whether it is being called in a sync or async context? For example, could await Intl.DateTimeFormat() have different behavior than Intl.DateTimeFormat()? @ljharb

I'm just trying to think of unobtrusive ways to add data loading to the API. It would be nice if you could do the following, but it's not clear whether that is possible without breaking the web.

let dtf = await Intl.DateTimeFormat();
console.log(dtf.format(x));

One option @ljharb suggested was something like the following. It doesn't require changing the constructor, but it would give the otherwise immutable Intl.DateTimeFormat object two "states", one where data is present and one where it is not.

let dtf = new Intl.DateTimeFormat();
await dtf.load();
console.log(dtf.format(x));

We could add a new namespace for the async-enabled constructors, like Intl.Async. The new namespace would have all of the same constructors as the Intl namespace, except that they return promises that resolve to "normal" objects.

let dtf = await Intl.Async.DateTimeFormat();
console.log(dtf.format(x));

Or, we could put data loading into the terminal format method. The downside here is that you put async operations into a function that was never async before, so it might be harder to use as a drop-in replacement. For example, if you have to pass your object as an argument to some other function, that function needs to know whether to use the async version of the terminal method.

let dtf = new Intl.DateTimeFormat();
console.log(await dtf.asyncFormat(x));

// problem if you have to pass dtf to a function like this
function doStuffWithDateTimeFormat(dtf) {
  // should this function use .format() or .asyncFormat() ?
}
ljharb commented 4 years ago

You can't usefully detect that, no, and if you could it would break use cases where people don't await immediately but still do something with the promise.

If a constructor returns a promise, than instanceof will fail until it's awaited, which would be confusing.