tc39 / proposal-intl-displaynames

Get localized display names for languages, scripts, regions and others. https://tc39.github.io/proposal-intl-displaynames/
MIT License
44 stars 10 forks source link

Error handling while there are no name for the code #23

Closed FrankYFTang closed 5 years ago

FrankYFTang commented 5 years ago

@sffc wrote in https://github.com/tc39/proposal-intl-displaynames/issues/11

Do we really want to throw an error if data is not available, or just return null? If we return null, then we can also use that behavior when exporting a list.

Unless the spec explicitly lists which region codes have to be supported, for example, I do not like the idea of throwing an exception here, because then it means that the normal, expected way to call the function is to wrap it in a try-catch just in case the implementation does not have the needed data.

zbraniecki commented 5 years ago

I would expect it to return a code for values it doesn't have better data for.

So, for ["en", "fr", "de"] it would return ["English", "fr", "German"] for example.

ljharb commented 5 years ago

How could one programmatically and reliably differentiate a code from a name? (ie, identify when the data wasn’t there)

FrankYFTang commented 5 years ago

In the current form of the spec, it will return ["English", undefined, "German"]

zbraniecki commented 5 years ago

How could one programmatically and reliably differentiate a code from a name? (ie, identify when the data wasn’t there)

The operating paradigm behind Intl APIs is that they're a) best-effort and b) opaque. Developer is not supposed to interact with that output of the Intl APIs.

That applies to date formatting (we do not report to you that we didn't find a name for the timezone and formatted it to its code), currency formatting (we do not report that we didn't find a symbol for a currency and used the code), and for locales IMHO we should do the same.

In other words, when you use Intl API you're saying "please, take those semantic values, and present them to the user in the best way you can". That "best way" may mean looking into user preferences expressed via unicode extension keys, or some env settings, but the result is meant to be opaque.

I'd like to return developer-actionable output only in cases where there is no easy "code" to return (like there is for timezones, currencies, languages, regions etc.).

littledan commented 5 years ago

We have discussed this recently in the context of units, and I think the same applies here: we should only use the code when it's a relevant string for humans. This is a yes for currencies, no for units, and I suspect no for display names as well. It's easier to see missing values and have an application-defined fallback if it returns undefined.

zbraniecki commented 5 years ago

This is a yes for currencies, no for units, and I suspect no for display names as well.

I think that it's more complicated for display names, because some of them do work as codes. dateField weekOfYear code wouldn't, but en-US is a pretty good substitute for English (American).

My problem with undefined is that you're asking people to probe the API for data and then do something about missing data. That's not how people use intl APIs.

I don't know of any precedence where JS Intl API is used like:

let maybeFormattedNumber = num.AttemptToLocaleFormatNumber();
let formattedNumber = maybeFormattedNumber || ???;

or anything even remotely similar.

I think there are two reasons for that:

1) we don't want developers to mingle with the output. The output is for user consumption, rather than for further operations on.

That's crucially important difference - the way all Intl JS API is designed so far is basically putting the pipeline of Intl formatting as the last step before displaying. You get some data, and then you put it through Intl API in an attempt to potentially improve its representation for the given reader. It's opaque on purpose.

You're designing this API as a regular developer API where the output is meant to be verified and operated on. That may sound like a minor shift, but I think it actually is pretty major divergence that will result in us trying to teach people later that they should always check output of one of Intl APIs, but not the other. DateTimeFormat, NumberFormat, PluralRules, getCanonicalLocales will always return data at least as good as the input you provided, as long as the input + options bag was valid. DisplayNames may return output, depending on what data it carries, and you're supposed to verify that and write code to handle scenarios where it doesn't.

2) User of this API is almost always not in a position to provide any fix to the problem

If the user starts with en-US and we don't have any data for them to provide a display name, it's very unlikely that they have a handy dataset to improve the output on their own.

The scenario you optimize for - "we carry potentially incomplete dataset for valid inputs, so let's allow the user to handle the missing data on their own" is imho unrealistic. If the user already had display name value for the code, why would they use the API?

My concern is that you're changing the nature of JS Intl API from being a best-effort formatter, into a data supplying API.

FrankYFTang commented 5 years ago

but en-US is a pretty good substitute for English (American).

I think people know what is "US", but no body except us know what is "en". Also, how about "pt-BR". I am pretty sure no body except up to ~1000 people (not even 97% of the software engineers in this world) understand what is "pt" or "BR".

zbraniecki commented 5 years ago

Also, how about "pt-BR".

I agree, that is cryptic. But we're not comparing good vs bad here, we're comparing bad vs. I don't know what.

I'm asking if you expect that a regular use case of the API will involve the developer operating on the output in any meaningful way, and in particular, what would a realistic fallback for a missing display name be?

ljharb commented 5 years ago

A contrived example:

const output = get(code);
if (!output) {
  return `Unknown language! Code: ${code}`;
}

It's a bit messier to do if (output === code) {, and if the code is normalized at all, that wouldn't work.

zbraniecki commented 5 years ago

It's a bit messier to do if (output === code) {, and if the code is normalized at all, that wouldn't work.

No doubt! And thank you for the example, but...

what is the use case? Language selector? Can you see a UI that looks like this:

<select>
  <option>English (American)</option>
  <option>Unknown language! Code: pt-BR</option>
  ...
</select>

or maybe:

<ul>
  <li>User name: John</li>
  <li>User language: Unknown language! Code: pt-BR</li>
</ul>

? My argument is that there's little beyond displaying the code that any user might want to do, and designing an API that basically makes checking out mandatory is likely going to lead to developers either not using it, or using it without checking for the output which will then break hard (JS errors, broken website etc.) when we do return undefined, and that, in turn, will lead to everyone having to ship all the data that the dominant ecosystem handles.

So I guess my follow up question is: do you think we will document that this API requires user to verify output and provide custom fallback? If yes, do we agree then that this API will diverge from all preceding Intl APIs in how the output is expected to be handled? If not, do we accept the above conclusion of likely outcome of us specifying this API as is?

ljharb commented 5 years ago

Another alternative is that a language that doesn't have something better than the code, could be hidden altogether (like in your user description case).

zbraniecki commented 5 years ago

Another alternative is that a language that doesn't have something better than the code, could be hidden altogether (like in your user description case).

What is the use case again? Language selector in a website that hides German because user's web browser didn't carry German DisplayName in it's Intl API implementation? I doubt it would happen...

ljharb commented 5 years ago

On a user's profile page, say, why would we display their language if there's not a human-readable way to describe it?

sffc commented 5 years ago

For what it's worth, in a few places in ICU code where this matters, ICU has an option named "no-substitute" that defaults to false, but you turn it to true if you want a null return value instead of a default value.

Example: ICU CurrencyDisplayNames

zbraniecki commented 5 years ago

On a user's profile page, say, why would we display their language if there's not a human-readable way to describe it?

For the same reason you'd display user timezone even if you can't get a display name for it.

sffc commented 5 years ago

What do y'all think about adding a noSubstitute or noFallback option into the main option bag? Then you can get both behaviors.

If we did have such a setting, I would suggest that the default be to include the fallback because I do agree with what Zibi said about making a "best-effort formatter."

zbraniecki commented 5 years ago

It sounds reasonable to me.

littledan commented 5 years ago

I don't think it's worth the complexity. We are talking about saving a single line of code. I would rather the API user be responsible for this extremely simple fallback.

sffc commented 5 years ago

Now that it looks like we're back to string-in-string-out, assuming we return something falsy, the code ends up being,

displayNames.of("en-US") || "en-US";

We're saying that basically any call ever anywhere of this API would need to have such a fallback. Is that the design we want? I can see the argument that it's good to make people think explicitly about their fallbacks. I can also see Zibi's argument though that it's nice for the user if they don't have to think about the fallbacks.

littledan commented 5 years ago

Well, if this fallback is universally what's wanted, I could understand including it. I imagined that you may want to take some other path on missing data sometimes.

sffc commented 5 years ago

Another advantage of returning some sort of fallback string by default: if a developer writes code and it works in one engine, they might not test that it works in other engines that has different data. If we return a fallback string by default, then the other engines will at least have some kind of reasonable behavior. If we return null or undefined (a different data type), the user code might throw an unhandled exception in those other engines, which is arguably worse, because it can make their whole site break, not just one little string.

Is there a reason we aren't putting in the spec the entire set of strings that must be supported, like we're doing for measurement units?

gibson042 commented 5 years ago

I entered this issue solidly in favor of conveying localization failure with empty non-string values, but @zbraniecki makes a compelling argument for leaving input unmodified in support of (presumably majority) cases where the concern is missing data rather than bad input. Especially because ICU has already faced this decision, I'm in favor of @sffc's suggestion to follow suit by making this behavior configurable with a default that falls back to echoing back non-localizable input values.

I'd also like to see @zbraniecki's rationale added to ECMA-402 proper, because it clarifies the purpose of all included APIs as opaque best-effort formatters.

FrankYFTang commented 5 years ago

May ECMA402 meeting conclude we should add an option (default fallback) to control what to return. The default will return the code as fallback for the case of language, region, script and currency. PR coming

FrankYFTang commented 5 years ago

with the "fallback" option merge into the spec, I am closing this issue.