Describe how to implement PaymentAddress.languageCode

marcoscaceres commented 7 years ago

Implementers are currently figuring out how to derive the languageCode's value in an interoperable manner.

During the CR phase of standardization, implementers will report back on implementation experience, possibly using Unicode CLDR to derive the languageCode. Please see comments of this GitHub issue for followup, or join the discussion.

rsolomakhin commented 7 years ago

Yep, we use BuildComponents() function in our implementation.

// Returns the UI components for the CLDR |region_code|. Uses the strings from
// |localization|. The components can be in default or Latin order, depending on
// the BCP 47 |ui_language_tag|.
//
// Sets the |best_address_language_tag| to the BCP 47 language tag that should
// be saved with this address. This language will be used to get drop-downs to
// help users fill in their address, and to format the address that the user
// entered. The parameter should not be NULL.
//
// Returns an empty vector on error.
std::vector<AddressUiComponent> BuildComponents(
    const std::string& region_code,
    const Localization& localization,
    const std::string& ui_language_tag,
    std::string* best_address_language_tag);

This language tag is especially useful for countries that present their addresses in two different ways depending on the language tag. For example, ja-JP language tag results in country being displayed on top, whereas ja-Latn language tag results in country being displayed on the bottom of the address.

marcoscaceres commented 7 years ago

@rsolomakhin thanks for that info. That's super helpful. I need to now look at the implications of this for the spec, as it demystifies some of the magic.

marcoscaceres commented 6 years ago

@rsolomakhin, could you provide me with an address that triggers a non-empty-string value for languageCode? I tried using fake Japanese addresses, but couldn't get it to return "ja-latn" or even "jp-jp".

rsolomakhin commented 6 years ago

It's already implemented on Android, but desktop will implement it soon, as well.

screenshot_20170918-094539

marcoscaceres commented 6 years ago

awesome. Can you copy/paste that JSON structure here? I can then use it in a real test over in WPT.

rsolomakhin commented 6 years ago

{
  "requestId": "df8d3adb-4db5-4dbc-903b-52cf2c8c21ff",
  "methodName": "basic-card",
  "details": {
    "cardholderName": "Jon Doe",
    "cardNumber": "4111111111111111",
    "expiryMonth": "01",
    "expiryYear": "2021",
    "cardSecurityCode": "123",
    "billingAddress": {
      "country": "JP",
      "region": "沖縄県",
      "city": "Saporro",
      "dependentLocality": "",
      "addressLine": [
        "123 Main At"
      ],
      "postalCode": "111111",
      "sortingCode": "",
      "languageCode": "ja-Latn",
      "organization": "",
      "recipient": "Jon Doe",
      "phone": "+8113103106000"
    }
  },
  "shippingAddress": {
    "country": "JP",
    "addressLine": [
      "123 Main At"
    ],
    "region": "沖縄県",
    "city": "Saporro",
    "dependentLocality": "",
    "postalCode": "111111",
    "sortingCode": "",
    "languageCode": "ja-Latn",
    "organization": "",
    "recipient": "Jon Doe",
    "phone": "+8113103106000"
  },
  "shippingOption": "freeShippingOption",
  "payerName": null,
  "payerEmail": null,
  "payerPhone": null
}

jmacwhyte commented 6 years ago

I lived in Japan for a long time and have a fair bit of experience with handling Japanese addresses in front end web development. I ran into the common headaches of displaying addresses nicely to users, so I know very well why this was proposed!

That said, from my experience I think the languageCode item is not needed in this spec. If merchants want to make the extra effort to display addresses in the correct order, they should be able to simply look at the country code of the address to know how the address components should be arranged. The language of the address shouldn't matter, as (in theory) the formatting of addresses for a particular region should be the same regardless of which language it is written in.

I personally am in favor of reducing feature bloat, and feel this item wouldn't really have much of a use and therefore should be left out.

rsolomakhin commented 6 years ago

The language of the address shouldn't matter, as (in theory) the formatting of addresses for a particular region should be the same regardless of which language it is written in.

At Google we prepend the 〒 character to the postal code only when formatting ja-JP addresses, which is also in the opposite order from ja-Latn addresses. The following locations all have different formatting rules when the Latn script code is used:

CN - China
HK - Hong Kong
JP - Japan
KR - South Korea
MO - Macau
TW - Taiwan

marcoscaceres commented 6 years ago

It's still not super clear algorithmically what needs to happen here to get interop :( We can be hand-wavy in the spec about this, but I don't know what the interop impact will be (if any).

riking commented 6 years ago

At Google we prepend the 〒 character to the postal code only when formatting ja-JP addresses, which is also in the opposite order from ja-Latn addresses.

To clarify, can you post formatted versions of the above example address in the two different language codes? That should probably help determine what the algorithm needs to be.

I can certainly image that some merchants would prefer that the billing/shipping address be available as simple Unicode text with newlines, especially not-fully-automated small merchants that might be handwriting a letter or similar, as well as merchants with inadequate automation that can e.g. only handle US addresses.

Being able to write if (addr.languageCode != 'en-US') { postData.shippingAddrString = addr.asString } else { /* post decomposed form to server which only expects 5-digit postalCode */ } would help with not requiring the payment-requesting server to know about every single country's addressing systems (the user agent is in a much better position to "know about every single country's addressing systems").

stpeter commented 6 years ago

I tend to agree with the feedback that @jmacwhyte provided - this is a matter of locale, not language.

rsolomakhin commented 6 years ago

Chrome uses the libaddressinput logic to calculate the languageTag. Would you prefer to see pseudocode instead of the code?

marcoscaceres commented 6 years ago

We need something we can abstract in such a way that someone not using libaddressinput could still implement in an interoperable manner.

Is that possible?

rsolomakhin commented 6 years ago

Here's the abstracted algorithm:

Let address be the given address.
Let addressCountry be the 2-letter country code of the address.
Let countryLanguages be an unordered set of 2-letter language codes for languages known to be used in the addressCountry.
Let defaultCountryLanguage be the primary country language if known, or the first in alphabetical ordering of countryLanguages otherwise.
Let countryHasLatinScriptAddressFormat be a boolean that is true if addressCountry has address formatting rules specifically for languages that use Latin script that is different from the address format for countryLanguages.
If countryHasLatinScriptAddressFormat is true, then let defaultCountryLatinScriptLanguageTag be the BCP47 language tag constructed from defaultCountryLanguage as the "language subtag" and "Latn" as the "script subtag" in BCP47 terminology.
Let userAgentLocale be the locale of the user agent.
Let userAgentLanguage be the language part of the userAgentLocale.
Let userAgentScript be the script part of the userAgentLocale, if any.
If countryLanguages is empty, return userAgentLocale.
Else if userAgentScript is "Latn" and countryHasLatinScriptAddressFormat is true, return defaultCountryLatinScriptLanguageTag.
Else if userAgentLanguage is in the countryLanguages set, then return the BCP47 language tag constructed from userAgentLanguage as the "language subtag" and addressCountry as the "region subtag" in BCP47 terminology.
Else if countryHasLatinScriptAddressFormat is true and userAgentLanguage uses Latin script alphabet, return defaultCountryLatinScriptLanguageTag.
Else return the BCP47 language tag constructed from defaultCountryLanguage as the BCP47 "language subtag" and addressCountry as the "region subtag" in BCP47 terminology.

(BCP47: https://tools.ietf.org/html/bcp47)

Examples:

country="US"
countryLanguages=["en"]
defayltCountryLanguage="en"
countryHasLatinScriptAddressFormat=false
userAgentLocale="en-US"
userAgentLanguage="en"
userAgentScript=undefined
return "en-US"

country="US"
countryLanguages=["en"]
defayltCountryLanguage="en"
countryHasLatinScriptAddressFormat=false
userAgentLocale="ja-JP"
userAgentLanguage="ja"
userAgentScript=undefined
return "en-US"

country="JP"
countryLanguages=["ja"]
defaultCountryLanguage="ja"
countryHasLatinScriptAddressFormat=true
userAgentLocale="ja-JP"
userAgentLanguage="ja"
userAgentScript=undefined
return "ja-JP"

country="JP"
countryLanguages=["ja"]
defaultCountryLanguage="ja"
countryHasLatinScriptAddressFormat=true
userAgentLocale="en-US"
userAgentLanguage="en"
userAgentScript=undefined
return "ja-Latn"

country="JP"
countryLanguages=["ja"]
defaultCountryLanguage="ja"
countryHasLatinScriptAddressFormat=true
userAgentLocale="ru-Latn"
userAgentLanguage="ru"
userAgentScript="Latn"
return "ja-Latn"

marcoscaceres commented 6 years ago

YAY! thanks @rsolomakhin! I'll draft that up and add those tests.

stpeter commented 6 years ago

LGTM (thanks, Rouslan!) but I'm curious what @aphillips thinks as co-author of BCP 47 and i18n WG chair.

aphillips commented 6 years ago

This seems fairly bizarre (and probably over-complicated) to me. I'm going to make some general comments here, but I need to go look at what you're doing with PaymentAddress.languageCode before proposing alternatives. This smells extremely suspect to me at the outset, though.

General comment: the term "language code" is not specific enough. While it is the name of the final field, the word code is used in a few other places. You should say "language tag" or "language subtag" (or the specific subtag, such as "script subtag" or "region subtag" where it's possible to be specific). The term "locale" is also used extremely loosely here. The "locale" is identified by a language tag: there is no difference between a language tag and a locale identifier in this context.

For step 3, language codes should be language tags (and allow for full tags, since script/region/variant/extlang play a role in describing the language used). You appear to really mean the primary language subtags, which can get you into trouble if scripts or extlangs are in play.

Note well that primary language subtags are not always 2-letter subtags! There are many 3-letter subtags as well. Please don't exclude them.

What does userAgentLocale mean? The runtime environment ("RTE") locale? Or the locale (language) of the page where a form is being filled in? Or the language of the keyboard (usually a better hint than the RTE locale)? And why would this be interesting?

userAgentScript should allow for CLDR/ICU "addLikelySubtags" generation of the script. Most tags don't carry the script around.

Is there a reason you don't actually look at the content of the address (particularly for the script)? Applying Japanese order to a Latin script Japanese address usually looks odd and the Latin-script nature of the address has nothing to do with RTE. A really common thing for Amazon is export from JP to CN. The userAgentLocale is zh-Hans-CN and the country is CN but our vendors in JP only accept Latn for foreign addresses.

rsolomakhin commented 6 years ago

Thank you for the feedback, @aphillips . userAgentLocale is the RTE locale. This determines the language of the user agent UI. We are making an assumption that users with "en-US" user agent locale will type addresses using Latin script alphabet most of the time, which is important because some addresses have different formatting rules when a Latin script alphabet is being used. Is there a way to determine the script of the content of the address? If so, I was unaware.

stpeter commented 6 years ago

We discussed this on the i18n WG call today:

https://www.w3.org/2018/08/23-i18n-minutes.html

I have an action item to look into this more deeply and provide recommendations.

stpeter commented 6 years ago

NOTE: See https://github.com/w3c/payment-request/pull/764 for related discussion (mistakenly posted there instead of here).

ianbjacobs commented 5 years ago

The Chairs have recorded a decision to remove languageCode following a call for consensus: https://lists.w3.org/Archives/Public/public-payments-wg/2018Sep/0021.html

w3c / payment-request

Describe how to implement PaymentAddress.languageCode #608