unicode-org / message-format-wg

Developing a standard for localizable message strings
Other
237 stars 34 forks source link

Unit and currency formatting should be supported #838

Open eemeli opened 3 months ago

eemeli commented 3 months ago

During the February meetings, we ended up leaving out the :number style values currency and unit, and their related options: currency, currencyDisplay, currencySign, unit, unitDisplay.

Furthermore, we also include this direction in the spec:

Implementations SHOULD avoid creating options that conflict with these, but are encouraged to track development of these options during Tech Preview.

We should work towards supporting currency and unit formatting in MF2; hence this issue. Looking back to the discussions (also #621) on this topic, it would be good to get input on the parameters of an acceptable solution. To that end, I'm adding assignees here who have been vocal on the topic previously; also CC @ryzokuken.


To get us started, I propose that we re-add all the options currently left out of the tech preview, except for currency and unit. This would mean that in order to use currency or unit formatting, the operand of the :number function would need to include the appropriate currency or unit in addition to its value, and that supporting this would be left to each implementation.

For example, in the JS implementation this could work like this:

const mf = new MessageFormat('en', 'Your total is {$cost :number style=currency}')
const cost = { valueOf: () => 42, options: { currency: 'EUR' } }
mf.format({ cost }) // → 'Your total is €42.00'

With this approach, the placeholder that the translator sees would show that it's a formatted currency, but they would not be able to set the currency; that is provided in the operand.

As we do not specify the formatted results, a conformant implementation could choose to not support currency or unit formatting, and to ignore the relevant options.

~We may also want to consider enforcing exact selection when style is not decimal, as plural or ordinal selection on currency or unit values does not really make sense.~ Edit: I was wrong, see discussion below.

macchiati commented 3 months ago

Good move with currency

aphillips commented 3 months ago

I agree with adding "left behind" options. We should test drive the process described in #634 to do this.

Currency (and unit) is tricky. I tend to agree that not permitting the currency code or the unit to be hardcoded in a message is the I18N ideal. The unit/currency needs to be part of the value. The proposal above seems to suggest that it be a "shadow" option--invalid in an expression, but required in the operand's resolved value. This is a new thing.

I also suspect there are cases where someone might want {$amount :number style=currency currency=$myCurr}

There also might be some additional interactions here. For example, generally setting the currency changes the number of fraction digits on a formatter. However, sometimes users want to have control over the fraction digits. For example, they might want to format a currency amount without the fraction parts ($5) or show "invalidly" long numbers of fraction digits (10.93485 JPY). How regular options (such as fraction digits) interact with the unit/currency should be clearly defined. Perhaps we need option values like {$amount :number style=currency minimumFractionDigits=auto}

Alternatively... maybe we don't add currency/unit to the number functions but instead provide separate functions that can make better assumptions, e.g. {$amount :currency} and {$amount :unit}. Note that some of the quirks in e.g. plural selection might be easier to deal with by not shoving everything into the numbers colossus.


We may also want to consider enforcing exact selection when style is not decimal, as plural or ordinal selection on currency or unit values does not really make sense.

Hogwash. The code below (requires ICU4J) is a demo. Messages containing "$5" or "$1" or "$12.37" turn out to need pluralized patterns just as much as "5" and "1" and "12.37" do and current implementation can support this. Yes, the "fraction" rules are often in effect, but there is no need to break this.

    public static void mf838() {
        Locale[] localesToTest = new Locale[] { 
                Locale.US, 
                Locale.forLanguageTag("pl-PL"), 
                Locale.JAPAN, 
        };
        Currency[] currenciesToTest = new Currency[] { 
                Currency.getInstance("USD"), 
                Currency.getInstance("JPY"), 
        };
        BigDecimal[] amountsToTest = new BigDecimal[] { 
                new BigDecimal(0.00), 
                BigDecimal.ONE, 
                new BigDecimal(4.37),
                new BigDecimal(5.00), 
                new BigDecimal(5.55), 
        };

        for (Locale locale : localesToTest) {
            System.out.println(String.format("\nLocale: %1$s", locale.getDisplayName(Locale.US)));
            for (Currency currency : currenciesToTest) {
                System.out.println(String.format("Currency: %1$s", currency.getDisplayName(Locale.US)));
                for (BigDecimal amount : amountsToTest) {
                    com.ibm.icu.text.NumberFormat nf = com.ibm.icu.text.NumberFormat.getCurrencyInstance(locale);
                    nf.setCurrency(com.ibm.icu.util.Currency.fromJavaCurrency(currency));
                    nf.setMaximumFractionDigits(currency.getDefaultFractionDigits());
                    nf.setMinimumFractionDigits(currency.getDefaultFractionDigits());

                    PluralFormat pf = new PluralFormat(locale);
                    pf.setNumberFormat(nf);
                    pf.applyPattern("=0 {=0} zero {zero} one {one} two {two} few {few} many {many} other {other}");
                    System.out.print(String.format("%1$s %2$s", nf.format(amount), pf.format(amount)));
                    nf.setMaximumFractionDigits(0); // "integer" display of a value
                    pf.setNumberFormat(nf);
                    System.out.println(String.format("\t%1$s %2$s", nf.format(amount), pf.format(amount)));

                }
            }
        }
    }

Output (ICU75.1):

Locale: English (United States)
Currency: US Dollar
$0.00 =0    $0 =0
$1.00 other $1 one
$4.37 other $4 other
$5.00 other $5 other
$5.55 other $6 other
Currency: Japanese Yen
¥0 =0   ¥0 =0
¥1 one  ¥1 one
¥4 other    ¥4 other
¥5 other    ¥5 other
¥6 other    ¥6 other

Locale: Polish (Poland)
Currency: US Dollar
0,00 USD =0 0 USD =0
1,00 USD other  1 USD one
4,37 USD other  4 USD few
5,00 USD other  5 USD many
5,55 USD other  6 USD many
Currency: Japanese Yen
0 JPY =0    0 JPY =0
1 JPY one   1 JPY one
4 JPY few   4 JPY few
5 JPY many  5 JPY many
6 JPY many  6 JPY many

Locale: Japanese (Japan)
Currency: US Dollar
$0.00 =0    $0 =0
$1.00 other $1 other
$4.37 other $4 other
$5.00 other $5 other
$5.55 other $6 other
Currency: Japanese Yen
¥0 =0   ¥0 =0
¥1 other    ¥1 other
¥4 other    ¥4 other
¥5 other    ¥5 other
¥6 other    ¥6 other
eemeli commented 3 months ago

I also suspect there are cases where someone might want {$amount :number style=currency currency=$myCurr}

I'd be fine with currency and unit supported as normal options. That's what we had before, and then needed to drop them in February. Hence my initial proposal to leave them out, which theoretically could be considered as a further step.

We may also want to consider enforcing exact selection when style is not decimal, as plural or ordinal selection on currency or unit values does not really make sense.

Hogwash. The code below (requires ICU4J) is a demo. Messages containing "$5" or "$1" or "$12.37" turn out to need pluralized patterns just as much as "5" and "1" and "12.37" do and current implementation can support this. Yes, the "fraction" rules are often in effect, but there is no need to break this.

Could you share some example message in some locale that varies based on the plural category of a currency or unit value? I'm perfectly willing to admit to being wrong, but the code example you shared isn't showing that.

macchiati commented 3 months ago

Alternatively... maybe we don't add currency/unit to the number functions but instead provide separate functions that can make better assumptions, e.g. {$amount :currency} and {$amount :unit}. Note that some of the quirks in e.g. plural selection might be easier to deal with by not shoving everything into the numbers colossus.

I think that is a much better approach. Also works better with strongly typed languages (or linters) since you can verify that the input parameters are compatible with the function.

aphillips commented 3 months ago

Could you share some example message in some locale that varies based on the plural category of a currency or unit value?

It is admittedly the case that unit formatting (which includes currency formatting) tends to bring along the noun ("dollars", "centimeters", etc.) and the resulting formatted value often absorbs grammatical variation. However, you can have messages like:

.input {$amountNeeded :integer style=currency}
.match {$amountNeeded}
0 {{You have enough to go to the next round.}}
one {{{$amountNeeded} is needed to go to the next round.}}
* {{{$amountNeeded} are needed to go to the next round.}}
eemeli commented 3 months ago

Ok, that makes sense. Struck out that line from the post above.

aphillips commented 3 months ago

I think the #634 process should be our next stop. Specifically missing from #634 at the moment is a template for "registry entries". We can use the held-back options from LDML45 to test drive the process.

eemeli commented 3 months ago

That's one part of the next steps here, yes, but also separately we need to conclude the specific discussion raised by @sffc in particular on whether or not the default functions should allow for options like currency and unit that can be conceived as a part of the value being formatted, rather than as standalone options.

aphillips commented 3 months ago

TLDR: I think we should create :currency and :unit (or :measure, we can bikeshed this later) in the RGI registry and we should encourage good I18N hygiene by not providing support in :number or :integer.


The I18N best practice is to carry the unit with the number. Unfortunately, few if any programming languages come with built-in types that do this. This means that there are many applications that have worked around it by creating their own type/class/data structure. There are also many applications that have done something halfway, such as managing the currency or unit contextually. Developers working with non- or semi-internationalized applications should be able to do what they need to do.

To implement unit formatting (with currency as a special case) in MF, we need to support this diversity, which means that, in spite of I18N best practices, we will almost certainly need to support currency and unit as options on whatever formatting function we provide (so that users can pipe in the units from their context or data structures).

Many programming environments support some form of currency formatting as part of their built-in number formatting I18N APIs, but in most cases these were written Long Ago. ICU itself has evolved towards incorporating currencies as a special case of unit formatting via APIs such as MeasureFormat. I think this is a better model for MF2 and is why I wrote the bit that @macchiati replied to above.

I suggest the RGI registry because some runtime environments do not provide currency (or unit) formatting out-of-the-box and we should not be a burden on implementation in weakly-resourced environments. We should also, if we do this, amend the :number and :integer documentation to clearly state what we've done (so that implementers don't namespace their way around us).

We should also consider whether the default functions should consume unit values as numbers, e.g. .input {$amount :currency}.match{$amount :number}*{{You have {$amount} in your account!}}

macchiati commented 3 months ago

I disagree that we need to avoid doing the right thing.

Any implementation that can't directly support passing a currency value (= number & currency-code) can surely provide a little shim in their API to combine two parameters on their side.

aphillips commented 3 months ago

@macchiati I'm not saying that we should avoid doing the right thing. I'm saying we should provide ways for different users to accomplish the right thing. For example, I was talking with a developer whose application has a currency code associated with all of the values in a "report". They might want to write a message like:

You have {$amount :currency currency=$reportCurrency}.

If there is no native "currency amount" type, the :currency implementation won't be able to pre-specify what to do. This doesn't mean that an implementation can't support a "currency amount" type, when one exists or that it shouldn't provide a shim when one does not. Implementations absolutely should provide for proper I18N.

sffc commented 3 months ago

I think the spec should just normatively require support for combined number and currency/unit input types. Each implementation can do that in their own way. When an upgraded type is available, implementations can use it, and if not, a simple tuple or record can be used.

aphillips commented 1 day ago

Waiting to merge #922, which will close this issue.