Closed marcoscaceres closed 5 years ago
Yep, we use BuildComponents() function in our implementation.
// Returns the UI components for the CLDR |region_code|. Uses the strings from
// |localization|. The components can be in default or Latin order, depending on
// the BCP 47 |ui_language_tag|.
//
// Sets the |best_address_language_tag| to the BCP 47 language tag that should
// be saved with this address. This language will be used to get drop-downs to
// help users fill in their address, and to format the address that the user
// entered. The parameter should not be NULL.
//
// Returns an empty vector on error.
std::vector<AddressUiComponent> BuildComponents(
const std::string& region_code,
const Localization& localization,
const std::string& ui_language_tag,
std::string* best_address_language_tag);
This language tag is especially useful for countries that present their addresses in two different ways depending on the language tag. For example, ja-JP
language tag results in country being displayed on top, whereas ja-Latn
language tag results in country being displayed on the bottom of the address.
@rsolomakhin thanks for that info. That's super helpful. I need to now look at the implications of this for the spec, as it demystifies some of the magic.
@rsolomakhin, could you provide me with an address that triggers a non-empty-string value for languageCode
? I tried using fake Japanese addresses, but couldn't get it to return "ja-latn" or even "jp-jp".
It's already implemented on Android, but desktop will implement it soon, as well.
awesome. Can you copy/paste that JSON structure here? I can then use it in a real test over in WPT.
{
"requestId": "df8d3adb-4db5-4dbc-903b-52cf2c8c21ff",
"methodName": "basic-card",
"details": {
"cardholderName": "Jon Doe",
"cardNumber": "4111111111111111",
"expiryMonth": "01",
"expiryYear": "2021",
"cardSecurityCode": "123",
"billingAddress": {
"country": "JP",
"region": "沖縄県",
"city": "Saporro",
"dependentLocality": "",
"addressLine": [
"123 Main At"
],
"postalCode": "111111",
"sortingCode": "",
"languageCode": "ja-Latn",
"organization": "",
"recipient": "Jon Doe",
"phone": "+8113103106000"
}
},
"shippingAddress": {
"country": "JP",
"addressLine": [
"123 Main At"
],
"region": "沖縄県",
"city": "Saporro",
"dependentLocality": "",
"postalCode": "111111",
"sortingCode": "",
"languageCode": "ja-Latn",
"organization": "",
"recipient": "Jon Doe",
"phone": "+8113103106000"
},
"shippingOption": "freeShippingOption",
"payerName": null,
"payerEmail": null,
"payerPhone": null
}
I lived in Japan for a long time and have a fair bit of experience with handling Japanese addresses in front end web development. I ran into the common headaches of displaying addresses nicely to users, so I know very well why this was proposed!
That said, from my experience I think the languageCode item is not needed in this spec. If merchants want to make the extra effort to display addresses in the correct order, they should be able to simply look at the country code of the address to know how the address components should be arranged. The language of the address shouldn't matter, as (in theory) the formatting of addresses for a particular region should be the same regardless of which language it is written in.
I personally am in favor of reducing feature bloat, and feel this item wouldn't really have much of a use and therefore should be left out.
The language of the address shouldn't matter, as (in theory) the formatting of addresses for a particular region should be the same regardless of which language it is written in.
At Google we prepend the 〒
character to the postal code only when formatting ja-JP
addresses, which is also in the opposite order from ja-Latn
addresses. The following locations all have different formatting rules when the Latn
script code is used:
CN
- ChinaHK
- Hong KongJP
- JapanKR
- South KoreaMO
- MacauTW
- TaiwanIt's still not super clear algorithmically what needs to happen here to get interop :( We can be hand-wavy in the spec about this, but I don't know what the interop impact will be (if any).
At Google we prepend the 〒 character to the postal code only when formatting ja-JP addresses, which is also in the opposite order from ja-Latn addresses.
To clarify, can you post formatted versions of the above example address in the two different language codes? That should probably help determine what the algorithm needs to be.
I can certainly image that some merchants would prefer that the billing/shipping address be available as simple Unicode text with newlines, especially not-fully-automated small merchants that might be handwriting a letter or similar, as well as merchants with inadequate automation that can e.g. only handle US addresses.
Being able to write if (addr.languageCode != 'en-US') { postData.shippingAddrString = addr.asString } else { /* post decomposed form to server which only expects 5-digit postalCode */ }
would help with not requiring the payment-requesting server to know about every single country's addressing systems (the user agent is in a much better position to "know about every single country's addressing systems").
I tend to agree with the feedback that @jmacwhyte provided - this is a matter of locale, not language.
Chrome uses the libaddressinput logic to calculate the languageTag
. Would you prefer to see pseudocode instead of the code?
We need something we can abstract in such a way that someone not using libaddressinput could still implement in an interoperable manner.
Is that possible?
Here's the abstracted algorithm:
address
be the given address.addressCountry
be the 2-letter country code of the address
.countryLanguages
be an unordered set of 2-letter language codes for languages known to be used in the addressCountry
.defaultCountryLanguage
be the primary country language if known, or the first in alphabetical ordering of countryLanguages
otherwise.countryHasLatinScriptAddressFormat
be a boolean that is true if addressCountry
has address formatting rules specifically for languages that use Latin script that is different from the address format for countryLanguages
.countryHasLatinScriptAddressFormat
is true, then let defaultCountryLatinScriptLanguageTag
be the BCP47 language tag constructed from defaultCountryLanguage
as the "language subtag" and "Latn" as the "script subtag" in BCP47 terminology.userAgentLocale
be the locale of the user agent.userAgentLanguage
be the language part of the userAgentLocale
.userAgentScript
be the script part of the userAgentLocale
, if any.countryLanguages
is empty, return userAgentLocale
.userAgentScript
is "Latn" and countryHasLatinScriptAddressFormat
is true, return defaultCountryLatinScriptLanguageTag
.userAgentLanguage
is in the countryLanguages
set, then return the BCP47 language tag constructed from userAgentLanguage
as the "language subtag" and addressCountry
as the "region subtag" in BCP47 terminology.countryHasLatinScriptAddressFormat
is true and userAgentLanguage
uses Latin script alphabet, return defaultCountryLatinScriptLanguageTag
.defaultCountryLanguage
as the BCP47 "language subtag" and addressCountry
as the "region subtag" in BCP47 terminology.(BCP47: https://tools.ietf.org/html/bcp47)
Examples:
country="US"
countryLanguages=["en"]
defayltCountryLanguage="en"
countryHasLatinScriptAddressFormat=false
userAgentLocale="en-US"
userAgentLanguage="en"
userAgentScript=undefined
return "en-US"
country="US"
countryLanguages=["en"]
defayltCountryLanguage="en"
countryHasLatinScriptAddressFormat=false
userAgentLocale="ja-JP"
userAgentLanguage="ja"
userAgentScript=undefined
return "en-US"
country="JP"
countryLanguages=["ja"]
defaultCountryLanguage="ja"
countryHasLatinScriptAddressFormat=true
userAgentLocale="ja-JP"
userAgentLanguage="ja"
userAgentScript=undefined
return "ja-JP"
country="JP"
countryLanguages=["ja"]
defaultCountryLanguage="ja"
countryHasLatinScriptAddressFormat=true
userAgentLocale="en-US"
userAgentLanguage="en"
userAgentScript=undefined
return "ja-Latn"
country="JP"
countryLanguages=["ja"]
defaultCountryLanguage="ja"
countryHasLatinScriptAddressFormat=true
userAgentLocale="ru-Latn"
userAgentLanguage="ru"
userAgentScript="Latn"
return "ja-Latn"
YAY! thanks @rsolomakhin! I'll draft that up and add those tests.
LGTM (thanks, Rouslan!) but I'm curious what @aphillips thinks as co-author of BCP 47 and i18n WG chair.
This seems fairly bizarre (and probably over-complicated) to me. I'm going to make some general comments here, but I need to go look at what you're doing with PaymentAddress.languageCode before proposing alternatives. This smells extremely suspect to me at the outset, though.
General comment: the term "language code" is not specific enough. While it is the name of the final field, the word code
is used in a few other places. You should say "language tag" or "language subtag" (or the specific subtag, such as "script subtag" or "region subtag" where it's possible to be specific). The term "locale" is also used extremely loosely here. The "locale" is identified by a language tag: there is no difference between a language tag and a locale identifier in this context.
For step 3, language codes should be language tags (and allow for full tags, since script/region/variant/extlang play a role in describing the language used). You appear to really mean the primary language subtags, which can get you into trouble if scripts or extlangs are in play.
Note well that primary language subtags are not always 2-letter subtags! There are many 3-letter subtags as well. Please don't exclude them.
What does userAgentLocale
mean? The runtime environment ("RTE") locale? Or the locale (language) of the page where a form is being filled in? Or the language of the keyboard (usually a better hint than the RTE locale)? And why would this be interesting?
userAgentScript
should allow for CLDR/ICU "addLikelySubtags" generation of the script. Most tags don't carry the script around.
Is there a reason you don't actually look at the content of the address (particularly for the script)? Applying Japanese order to a Latin script Japanese address usually looks odd and the Latin-script nature of the address has nothing to do with RTE. A really common thing for Amazon is export from JP to CN. The userAgentLocale
is zh-Hans-CN
and the country is CN
but our vendors in JP only accept Latn for foreign addresses.
Thank you for the feedback, @aphillips . userAgentLocale
is the RTE locale. This determines the language of the user agent UI. We are making an assumption that users with "en-US" user agent locale will type addresses using Latin script alphabet most of the time, which is important because some addresses have different formatting rules when a Latin script alphabet is being used. Is there a way to determine the script of the content of the address? If so, I was unaware.
We discussed this on the i18n WG call today:
https://www.w3.org/2018/08/23-i18n-minutes.html
I have an action item to look into this more deeply and provide recommendations.
NOTE: See https://github.com/w3c/payment-request/pull/764 for related discussion (mistakenly posted there instead of here).
The Chairs have recorded a decision to remove languageCode following a call for consensus: https://lists.w3.org/Archives/Public/public-payments-wg/2018Sep/0021.html
Implementers are currently figuring out how to derive the
languageCode
's value in an interoperable manner.During the CR phase of standardization, implementers will report back on implementation experience, possibly using Unicode CLDR to derive the
languageCode
. Please see comments of this GitHub issue for followup, or join the discussion.