slang-i18n / slang

Type-safe i18n for Dart and Flutter
https://pub.dev/packages/slang
MIT License
453 stars 37 forks source link

What about l10n? #112

Open lucavenir opened 1 year ago

lucavenir commented 1 year ago

Hi there, thank you for this awesome lib.

I realize that slang is about i18n, but are there any plans to move this library towards a l10n approach? It might sound like this is an "out-of-scope" proposal, but hear me out for a bit.

Motivation, Use cases

Most of the motivation arises from the necessity to properly format numbers, and maybe add some global or per-string configuration.

Take this example from the docs

# File: strings.i18n.yaml
someKey:
  apple:
    one: "I have $n apple.",
    other: "I have $n apples."

Now, say that such quantitative number that needs a proper formatting; there's a good chance that this most likely depends on where you're translating that sentence, even in the same language.

Say we have a thousands apples. Here's how you would count them in London: 1.000,00 apples. Here's how you could count them in Dublin: 1,000.00 apples.

And that's weird enough, thinking it's the same lang and the two cities are 450km apart (: Well, to be fair this also depends on personal preference, culture and context usage; thus the distinction line in this use case is thin.

Nonetheless, when you have a currency instead of simple apples; things get interesting.

# File: strings.i18n.yaml
someCurrencyKey:
  amount: "I feel rich, since I have $n $currency." # warn: pseudocode that doesn't make much sense, but bear with me

I'd expect amount to be properly formatted, or at least I'd expect our tool to give me some degree of freedom when formatting such number (different contexts may or may not follow their locale conventions).

Say we are rich and we want to show it off with an explicit format. Here's how you would do that in London: £ 1,000,000.00 GBP. Here's how you would do that in Dublin: 1.000.000,00 € EUR.

Current state

Can we achieve this at this point in time?

Here's my guess (and I'm not sure, so let me know if there're better alternatives):

  1. Rely on intl
  2. Carefully write reusable boilerplate code for your use cases
  3. Inject it into slang strings

The proposal

Shortly, this proposal is about easening our developer experience when creating translatable sentences that involve numbers and their formatting. The default should be the following: current locale settings should apply first, then global settings kick in (build.yaml config file) and finally per-string settings apply.

Here's a heavily personalized string example:

# File: strings.i18n.yaml
someKey:
  # warn: pseudocode
  sentence: "Where lambo? I have $n!"
    - type: number
    - format: currency
    - decimalDigits: 2
    - name: GBP
    - symbol: £
    - customPattern: "¤ #,##0.00"

Usage:

final lambo = t.someKey.sentence(0xFFFFFF);
print(lambo); // "Where lambo? I have £ 16,777,215.00!"

Note how this example relies onto dart APIs from the intl package, which (if I'm not mistaken) are ISO-like conventions. Obviously the previous example would be tedious to repeat for each string. That's were global configs and current locale would kick in as default fall-offs.

Beyond simple numbers

Another l10n topic is date formatting. Even here, we could rely on intl and give a similar configuration API and usage:

# File: strings.i18n.yaml
someKey:
  # warn: pseudocode
  sentence1: "Morning! Oh, today it's $date!"
    - type: datetime
    - format: MMMMEEEEd # MONTH_WEEKDAY_DAY from intl package
  sentence2: "Good Heavens, just look at the time! It's $time!"
    - type: datetime
    - format: Hm # HOUR24_MINUTE from intl package

Usage:

const myValue = const DateTime.utc(2022, 10, 29, 03, 30, 00);

// The following is a mock of running this code in a GMT-5 time zone
final dateAlabama = t.someKey.sentence1(myValue);
print(morningAlabama); // "Morning! Oh, today it's Friday, October 28"
final timeAlabama = t.someKey.sentence2(myValue);
print(timeAlabama); // "Good Heavens, just look at the time! It's 22:30!"

// The following is a mock of running this code in a GMT+1 time zone
final dateEurope = t.someKey.sentence1(myValue);
print(morningAlabama); // "Morning! Oh, today it's Saturday, October 29"
final timeEurope = t.someKey.sentence2(myValue);
print(timeAlabama); // "Good Heavens, just look at the time! It's 04:30!"

Let me know what you think about this.

Tienisto commented 1 year ago

Hi, thanks for your ideas!

I have typed parameters in mind a time ago (but not implemented) that should also solve this l10n issue.

Something like this:

{
  "sentence": "Today is ${date: DateTime}"
}

Here date is of type DateTime and will be formatted according to the intl rules.

Developers can create custom types to customize the formatting by specifying a $l10n entry:

// File: strings.i18n.json
{
  "$l10n": {
    "DateOnly(DateTime)": {
       "US": "MM-dd-yyyy",
       "DE": "dd.MM.yyyy"
    },
   "MyNumber(double)": {
       "default": "###.0#",
       "DE": "###.00"
    }
  },
  "sentence": "Today is ${date: DateOnly}", // using the "DateOnly" type
}

The problem with this solution is that the values in $l10n should be independent from i18n so maybe specify those types in build.yaml?

Add formatting rules next to the translations (like in your examples) sound also reasonable but it is currently too much bloat to me

lucavenir commented 1 year ago

Hi there again,

thank you for our quick response. It's great to see that.

Something like this:

{
  "sentence": "Today is ${date: DateTime}"
}

Here date is of type DateTime and will be formatted according to the intl rules.

Yes, looks great. Some sort of syntax that allows to distinguish between numbers, currencies, timestamps or plain strings (default) is needed imho.

Developers can create custom types to customize the formatting by specifying a $l10n entry:

// File: strings.i18n.json
{
  "$l10n": {
    "DateOnly(DateTime)": {
       "US": "MM-dd-yyyy",
       "DE": "dd.MM.yyyy"
    },
   "MyNumber(double)": {
       "default": "###.0#",
       "DE": "###.00"
    }
  },
  "sentence": "Today is ${date: DateOnly}", // using the "DateOnly" type
}

If I've understood this correctly, I (personally) don't like the proposal. I think it's good to write down some common ground, first. Take this example, which is similar to yours:

"MyNumber(double)": {
  "default": "###.0#",
  "en_US": "0,###.##0",
  "de": "###.00"
}

At a first glance it looks we've completely customized how eh_US and de devices will format MyNumber, but this is false. This json snippet only tells how many decimals and thousands we should show, and that this rule should change based on localization. Just to be clear, in these examples we're writing down ICU number formats, we're not explicitly telling which separators should we use.

Recall that ICU formatting patterns, like the one Flutter Intl is using, has dots (.) and commas (,) (and more symbols), which have a semantic meaning; they're not being used as plain symbols or separators. Dots represent the decimal separator, while commas represent a grouping separator (e.g. in the thousands).

I want to clarify this as much as possible: say you want to format the number nine hundred thirty-two thousand four hundred fifty-one and nine hundredths (I'm writing it down so it's l10n - agnostic); using the previous example specification, en_US devices will format this as 932,451.090, while de devices will format this as 932.451,09. We've specified how decimals and thousands should be grouped together, but the symbols being used are different.

More examples that work like this are found in intl docs:

var f = NumberFormat("###.0#", "en_US");
print(f.format(12.345));  // 12.34

Please note that this doesn't mean the developer hasn't flexibility in choosing how to specifically format numbers, currencies, strings and timestamps. Some clients may need specific number formatting even though they're typing from a different country. In the same docs it is explained how to customize that:

var eurosInUSFormat = NumberFormat.currency(locale: "en_US", symbol: "€");

The previous snippet allows to format nine hundred thirty-two thousand four hundred fifty-one and nine cents euros as a en_US readable format, which is €932,451.09.

The problem with this solution is that the values in $l10n should be independent from i18n so maybe specify those types in build.yaml? Add formatting rules next to the translations (like in your examples) sound also reasonable but it is currently too much bloat to me

I agree with the last sentence, but l10n is no easy topic. Indeed, most of the times people want to configure how to show numbers one time; therefore, custom number (or currency, or datetime) formats definitely should be put inside our global configuration, within build.yaml.

Nonetheless, I'd still allow the developer to customize this behavior on a particular translation (there might be one-time exceptions).

To be fair, even the build.yaml customizations shouldn't be necessary 99% of the time; I'd expect this library to just set the locale and call intl when necessary in its generated code. Indeed, Flutter intl listens to the Localization delegate and formats all values accordingly.

Try it yourself on your device:

final formatter = NumberFormat.currency();

print(formatter.format(932451.09));  // depends on where you're typing from (:

Since I'm using dartpad to try this snippet and my device has a en_US locale (even though I'm italian living in Italy), I'm seeing USD932,451.09 as a result of the previous snippet.

Tienisto commented 1 year ago

Yes, I think that my proposal to format numbers does not solve the issue completely.

What do you think about this:

{
  "myPriceSentence": "Price: ${price: MyPriceType}",
  "myDateSentence": "Today is ${date: MyDateType}"
}

In build.yaml:

parameter_types:
  MyPriceType:
    type: NumberFormat.currency
    decimalDigits: 2 # optional (like intl)
    name: GBP # optional (like intl)
  MyDateType:
    type: DateFormat
    pattern:
      default: 'MM-dd-yyyy'
      en_US: 'MM-dd-yyyy'
      de: 'dd.MM.yyyy'

Access:

String a = t.myPriceSentence(
  price: MyPriceType(4.35), // uses default behaviour specified in build.yaml
);

String b = t.myPriceSentence(
  price: MyPriceType(4.35, name: 'EUR', locale: 'fr'), // override
);

String c = t.myDateSentence(
  date: MyDateType(DateTime.now()), // uses default behaviour specified in build.yaml
);

As you said, most times we don't need to reinvent the wheel so the developer should be able to use existing formatters:

{
  "myPriceSentence": "Price: ${price: NumberFormat.currency}",
  "myDateSentence": "Today is ${date: DateFormat.yMd}"
}

The question here is, should the developer call it this way

t.myPriceSentence(price: 4.56);

or this way:

t.myPriceSentence(
  price: DefaultCurrencyNumber(4.56),
);

t.myPriceSentence(
  price: DefaultCurrencyNumber(4.56, locale: 'fr'), // here we are able to customize but at the cost of more bloat
);
lucavenir commented 1 year ago

This is just my opinion, but while I like the override API, I don't like the build.yaml per-string personalization at all.

I would add more overriding onto the json (or yaml, or arb, etc.) rather than moving the formatting options over another file, which would be too loosely coupled.

I'd use the default settings of a locale, first. So no settings at all implies using default localized settings (e.g. GMT +1 timezone in France).

Then, I'd keep the build.yaml settings to override such behavior with a per-package personalization (eventually, for each lang used in the project).

Then again, the per-string overrides should be written in the main json (or other formats) file, considering that our translations live in a "default" lang file (e.g. en_US.json) plus other optional langs (e.g. es_ES.json). Example:

// en_US.json
{
  "myPriceSentence": {  // e.g. € 1,234.05
      t: "Price: $price", // I suppose a migration tool is needed for this change, but it might be worth it
      type: 'currency', // or 'NumberFormat.currency' if you like it better
      format: '¤ #,##0.00', // optional as you said
      symbol: '€' // optional as you said
  }
}

// es_ES.json
{
  "myPriceSentence": {
      t: "Precio: $price",
      // you can't override type here
      format: '#.##0,00 ¤',
  }
}

Finally, one could implement a per-invocation override as you've suggested. I'd keep this API:

String a = t.myPriceSentence(price: 4.35); // standard translation, no overrides involved
String b = t.myPriceSentence(price: 4.35, symbol: '£', format: '...', locale: 'fr'), // overrides
);

About reinventing the wheel: I like this sentence, but again Flutter itself does have a native way to solve i18n and l10n problems, but it makes sense to implement an easier API that sometimes re-uses the good of the already-present APIs. But this is just my opinion

Tienisto commented 1 year ago

I agree that per-string personalization does not really exist. My version only allows for per-invocation and per-project.

With your version (while it does provide all necessary tools to customize per-string) it introduces a new (breaking) syntax which I dislike. Furthermore, I don't know how it will work with pluralization e.g.

someKey:
  apple:
    one: "I have $n apple.",
    other: "I have $n apples."

I agree that per-string personalization is needed in some way, so what about specifying them in the modifiers (existing syntax introduced in slang 3.0)?

It will look like this:

{
  "myPriceSentence(currencySymbol=€, numberPattern='¤ #,##0.00')": "The price of $itemName is ${price: currency}",
  "apple": {
    "one": "I have $n apple.", // n is already interpreted as a number
    "other": "I have $n apples."
  },
  "score(numberPattern='###,0')": "Hello $name, your highscore is ${score: double}", // need typing
  "date(datePattern='dd.MM.yyyy')": "Hello $name, today is ${today: DateTime}" // need typing
}

Regarding per-invocation. Adding new parameter names is always tricky as it may cause collisions. So I'd like to keep it at a minimum.

t.myPriceSentence(price: 4.56);

// optional "<parameter>Format" parameter
t.myPriceSentence(price: 4.56, priceFormat: NumberFormat.currency(locale: 'fr'));
lucavenir commented 1 year ago

I agree that per-string personalization does not really exist. My version only allows for per-invocation and per-project.

I see that and now I see that it feels like overkill. Plus, no one likes breaking changes. I had more ideas for pluralization, but it really doesn't make any sense to overcomplicate it as slang as a neat API (:

I agree that per-string personalization is needed in some way, so what about specifying them in the modifiers (existing syntax introduced in slang 3.0)?

Looking back on this, this is great, neat. One question though; would this syntax be acceptable?

{ "myPriceSentence": "The price of $itemName is ${price: currency(symbol:'€', format:'¤ #,##0.00', locale='en_US'))}" }

I personally like it slightly more as it's closer to intl APIs, as it feels like you're invoking a formatter function, which could be appreciated. Plus, in my opinion moving parameters close to the source should help readability if there's more than just one variable in one single translatable sentence.

Regarding per-invocation. Adding new parameter names is always tricky as it may cause collisions. So I'd like to keep it at a minimum.

Thinking again, absolutely, this makes sense. But at this point, I'm thinking that I wouldn't even bother. Why would one complicate the API when we could directly write

t.myPriceSentence(price: 4.56);
// need more formatting for a one-time invocation? Just ask intl.
final formatter = NumberFormat.currency(locale: 'fr');
t.myPriceSentence(price: formatter.format(4.56));

This implies having a more flexible API though, accepting either a double or a String (maybe by an explicit flag in the js file? I'm not sure).

SAGARSURI commented 6 months ago

Any update on this?

Tienisto commented 6 months ago

In v3.30.0, parameter types (e.g. Hello {name: String}) has been introduced.

I can imagine that The price is {price: currency} would a good next step (or The price is {price: currency(locale: 'fr')}).

The current problem I see is that we need to depend on intl. It caused some headache when managing dependencies.