editorial: mark languageCode at risk

marcoscaceres commented 6 years ago

The following tasks have been completed:

[x] Confirmed there are no ReSpec errors/warnings.
[ ] Added Web platform tests (link)
[ ] added MDN Docs (link)

Implementation commitment:

[ ] Safari (link to issue)
[ ] Chrome
[ ] Firefox
[ ] Edge (public signal)

Impact on Payment Handler spec?

Preview | Diff

marcoscaceres commented 6 years ago

Based on teleconference discussion and #608.

marcoscaceres commented 6 years ago

@rsolomakhin, big ask... but maybe we can write a small MDN description of how to achieve the same behavior using JS? That will at least give us a good story for anyone that might actually need this.

marcoscaceres commented 6 years ago

Filed bug on Gecko to remove the attribute: https://bugzilla.mozilla.org/show_bug.cgi?id=1485881

rsolomakhin commented 6 years ago

Filed an issue in Chromium: https://crbug.com/877521. Looking into doing this work in JavaScript for an MDN article may take a while for me personally.

stpeter commented 6 years ago

I still need to write up conclusions from the i18n WG meeting yesterday - might not get to that until the weekend.

marcoscaceres commented 6 years ago

@rsolomakhin,

Looking into doing this work in JavaScript for an MDN article may take a while for me personally.

No problem. It might not come up and when can do that if it does. Happy to help also when needed.

marcoscaceres commented 6 years ago

Merging this, as it's just editorial. I'll do a new PR for removal and move the various browser bugs over to that.

stpeter commented 5 years ago

For the sake of traceability, here are the conclusions of my i18n review (it's possible @aphillips might have more or better suggestions)...

First, the i18n WG guidelines for spec authors [0] don't say about how to handle web forms. There are a few suggestions about character encoding [1] and text direction [2] in the guidelines for content authors and developers, but those are rather minimal, too. Because the Payment Request API is a strange beast (in essence it moves ecommerce checkout forms out of web content and into a browser dialog), it's likely an outlier for advice to spec developers. I've raised the issue of web forms guidance for discussion in the i18n WG.

Second, there are a few topics we might want to broach in the Payment Request API spec, such as:

(a) Recommend that the browser set a language tag for user input in the payment dialog. For instance, it could inherit the language tag from the html lang attribute [3] on the merchant site.

(b) Recommend that the browser be able to handle a locale value that is distinct from the language tag. As noted in [4] and relevant for our use cases, "the region code is also sometimes used to indicate the physical location, market, legal, or other governing policies for the user."

(c) Require the browser to treat all input from the payment dialog as UTF-8, consistent with [1].

(d) Mention that the user can set a base direction for textual input, as described at [2].

Third, there are probably easier ways to determine the script of user-inputted text than the algorithm Rouslan provided [5] (which I take it described what libaddressinput [6] uses). For instance, the browser could simply inspect the characters themselves to see if there are in Latin script, Japanese script, etc. (I'll grant you that mixed-script input could be a challenge, though.)

Fourth, a scenario Addison mentioned on an i18n WG call is the need for the same address in multiple forms (e.g., an English-language version for delivery from the U.S. to a import handling location in China and a Chinese-language version for final delivery to the customer). We have not designed for this yet, but might want to open a tracking bug for multiple representations of the same address.

Fifth, a related scenario might be billing address in one script and shipping address in another script. This is simpler than multiple representations of the same address, but still requires support for two different scripts in the same set of input forms.

We might uncover additional issues in the future, but these are the ones we've discussed so far.

[0] https://w3c.github.io/bp-i18n-specdev/#loc_forms [1] https://www.w3.org/International/questions/qa-forms-utf-8 [2] https://www.w3.org/International/questions/qa-html-dir#userexplicit [3] https://www.w3.org/International/articles/language-tags/ [4] https://www.w3.org/TR/ltli/ [5] https://github.com/w3c/payment-request/issues/608#issuecomment-414546200 [6] https://github.com/googlei18n/libaddressinput/blob/3cefac503f6321f7f84a790939dc7cb022bce169/cpp/src/language.cc#L58

marcoscaceres commented 5 years ago

Thanks so much for this input, Peter and I18n folks. Just noting for (a), you’ll be happy to hear that’s literally what we recommend in the spec:

It is RECOMMENDED that the language of the user interface match the language of the body element.

I’ll write a full response for the other points, but the tl;dr is that at a glance we get most what’s mentioned for free from the IDL layer (DOMStrings are already UTF16, irrespective of payment dialog input fields). And we can defer to the merchant for script detection when they need it (hence removal of this attribute).

If there is to be a script/lang detection mechanism, we should add that as native functionality to ECMAScript via the Intl API, rather than as a one off for this API. Then it would be globally useful, not just for addresses, but for any kind of input.

aphillips commented 5 years ago

@stpeter, @marcoscaceres Thanks for the summary. I don't necessarily see that removing languageTag is a good thing: we recommend [0] otherwise and for good reason.

One thing about the language tag is that it should not be used to indicate region/country or jurisdiction. That should be a separate bit of data, such as an ISO-3166 code or such. The region subtag in a language tag can indicate defaults for market, legal, or other locale-affected API usages. But it is a separate thing and it is a best practice not to use it as a proxy. That is, the language of an address has nothing to do with where the address is in the world. LTLI says something about this, but the quote @stpeter cites needs more context and explanation.

When it comes to text analysis, there are a number of APIs for determining the script of content. The key thing to recall here is there is what we call the "common" script, consisting of characters shared between many different writing systems. Punctuation, for example. Understanding this reduces (but does not eliminate) cases where there are truly mixed script usages. Script is defined by Unicode and there are APIs that could be exposed in e.g. intl, although I caution that script isn't necessarily always useful in the way that this spec's usage seems to suggest.

There is a need for more general I18N documentation for things such as field handling and definition, defining locale-neutral data structures, cultural awareness, etc. The LTLI document that you mention is actually one of the items that the I18N WG prioritized just below our current work and which I hope that we can get back to once Charmod/String-Meta are out of our systems. In the meantime, happy to help.

[0] https://w3c.github.io/string-meta/#

marcoscaceres commented 5 years ago

The challenge here is having a clear algorithm to identify the language of content - in this case, an address. Does [0] provide the algorithm? Apologies if it does and I missed it. Without such an algorithm, none of us can implement languageCode Interoperability (current situation, why it’s now marked at risk).

At the risk of getting circular, if there is such an algorithm, whereby a string is give and out comes a language tag(s), then IMO, it should be part of Intl, because it would be univesally useful to the web platform.

stpeter commented 5 years ago

I see the point that @aphillips makes: in the Payment Request API, the end user is a producer (as defined in the string-meta spec [0]) of a string or set of strings, and this is our chance to attach metadata about the language and base direction of the string. If we don't do that when the string is created, some other consumer of the string will need to figure it out later on, and they won't have as much context as we do at string creation time...

stpeter commented 5 years ago

Thinking about this further, I have a question for @aphillips - the answer to which might clear up some of the confusion around the current languageCode attribute in the Payment Request API. Right now, languageCode defines "the language in which the address is provided", which is "used to determine the field separators and the order of fields when formatting the address for display". But the PaymentAddress is a composite of multiple strings (country, region, city, organization, etc.). Should each of those strings be flagged with a langtag and base direction? What if the organization name is in kanji or hiragana and the other address fields are in romaji/Latin? It seems that we might be talking about two different things here: (1) the langtag/direction of each string (or potentially a set of strings), and (2) the layout of the set of strings representing an address in the input form of the payment dialog that the browser displays (remember that the Payment Request API essentially defines a set of forms, because it moves the ecommerce checkout flow from web content to an in-browser dialog). The layout property might actually be locale-based, not language-based or script-based.

marcoscaceres commented 5 years ago

Oh, can we please move this discussion to https://github.com/w3c/payment-request/issues/608 ? The issue we are currently in was for the pull request to add “at risk” to the spec, and it’s been merged and closed.

w3c / payment-request

editorial: mark languageCode at risk #764