Add IDL for `LanguageMap`

aphillips commented 3 weeks ago

Called out by Webauthn in https://github.com/w3c/webauthn/issues/2151

emlun commented 3 weeks ago

Commenting on: https://github.com/aphillips/string-meta/commit/9624fbe632a6293c272299eb5f4d8a9f49cb4b43

I'm quite confused, because it seems like the language map examples are not consistent with each other or the LanguageMap definition, nor the LanguageMap definition with the prose descriptions:

§2.1.4 Language Maps lists Example 4:
```
"field-name-goes-here": {
  "en":    {"value": "This is English"},
  "en-GB": {"value": "This is UK English", "dir": "ltr"},
  "fr":    {"value": "C'est français", "lang": "fr-CA", "dir": "ltr"},
  "ar":    {"value": "هذه عربية", "dir": "rtl"}
}
```
Here, it looks like a language map is a simple map with language tags as keys and Localizable-like (but where lang and dir are optional) values. In particular, this example is valid JSON, so presumably this is what a JSON representation of a language map should look like.
§6. Localization Considerations defines language indexing and lists Example 14:
One approach a specification might provide for returning multiple languages of a given field is called language indexing. In language indexing, a given field's value is an array of key-value pairs. [...]

Example 14
```
"title": [ "en": { "value": "Learning Web Design", "lang": "en" },
         "ar": { "value": "التعلم على شبكة الإنترنت التصميم", "lang": "ar",  "dir": "rtl"}, 
         "ja": { "value": "Webデザインを学ぶ", "lang": "ja" },
         "zh-Hans": { "value": "学习网页设计", "lang": "zh-Hans", "dir": "ltr"} ],
```
This example is not valid JSON, so one would assume this is rather an abstract example of a sequence of key-value pairs. This is arguably compatible with Example 4 on that abstract level, but it's unclear how any particular serialization of this should look. For JSON the example suggests an array, but not how pairs are represented (Pair arrays:["en", {"value": ...}]? Two-attribute objects: {"key": "en", "value": { "value": ... }}?).

Examples 15 and 16 are (almost) valid JSON, but clearly incompatible with Example 4:
Example 15
```
"title": [ {
 "de": {"value": "HTML und CSS verstehen", "language": "de-DE" },
 ...
],
```
Still, it's ambiguous whether each object in the array should have exactly one key or may contain more than one key. Either way I don't understand what would be the benefit of wrapping these objects in an array rather than merging the objects, assuming each key is unique among all objects in the array (and if it's not, what would multiple occurrences of a language tag key mean? How should an application use them?). Is the definition order significant in some way?
Finally, A.2 LanguageMap dictionary (not yet published) defines LanguageMap as:
```
dictionary LanguageMap {
      DOMString field;
      sequence<LanguageRecord> languageRecord;
};
```
This is unambiguous, but it doesn't agree with the structure in Example 4, and adds an additional object layer around the sequence described in Example 14.

I also don't understand what is meant by the field member:

field member Identifier for the field containing the Language Map

Does this mean that field should be set to "languageRecord"? Or that the languageRecord member can be renamed, and field identifies its new name? Or is it the name of the field being localized, i.e., { "some-localizable-string": { "field": "some-localizable-string", "languageRecord": [...] }? I don't understand the purpose of any of those options, so is it something else entirely?

Could you help me understand how these definitions and examples are meant to relate?

The way I interpret the intent of the prose descriptions, Example 4 matches what I would expect. I think the map syntax best expresses the intent of a collection of key-value pairs - and is easiest to work with as a developer - and it doesn't seem useful to use an explicit sequence structure for implementation efficiency, if that is the concern. Maps and sequences most likely take the same time to parse or search in anyway: in JSON, CBOR and XML a plain linear search is needed since the items don't describe their serialization length, while in ASN.1 DER the parser can "skip ahead" in a map just as easily as in a sequence. So without knowledge of any other concerns that went into this design, my expectation for a LanguageMap IDL definition would simply be a record type with language tags as keys and LanguageEntry values:

typedef record<DOMString, LanguageEntry> LanguageMap;

// Or alternatively:
typedef DOMString LanguageTag;
typedef record<LanguageTag, LanguageEntry> LanguageMap;

Either of these would neatly match Example 4 and be easy to work with as a developer.

aphillips commented 3 weeks ago

@emlun Thanks for the comments. The IDL for LanguageMap is incorrect. It should be a record as noted.

Example 14 (and nearby friends) is definitely broken, wrt being valid JSON. I'll fix that also while making the necessary changes.

emlun commented 3 weeks ago

Thanks @aphillips! The current design looks good to me.

I spotted a few more minor issues:

https://github.com/aphillips/string-meta/blob/437f9a5964e5a2145d4f5155c6d522c890efd740/index.html#L1607

This still uses DOMString as the key type in the record definition, but references LanguageTag in the prose description.
https://github.com/aphillips/string-meta/blob/437f9a5964e5a2145d4f5155c6d522c890efd740/index.html#L1626

LanguageRecord no longer exists.

Typo: <kdb> instead of <kbd> in <kdb>LanguageMap</kdb>
https://github.com/aphillips/string-meta/blob/437f9a5964e5a2145d4f5155c6d522c890efd740/index.html#L1448

Typo: "value is map"
https://github.com/aphillips/string-meta/blob/437f9a5964e5a2145d4f5155c6d522c890efd740/index.html#L1461

This still refers to "key to the value array" rather than "map". Also, the <a> in <a>language ranges</a> 2 lines down doesn't appear to do anything as far as I can tell.

aphillips commented 3 weeks ago

This still uses DOMString as the key type in the record definition, but references LanguageTag in the prose description.

It's a bug in Respec. Respec reports an error because LanguageTag is not DOMString. I am considering ignoring the error.

LanguageRecord no longer exists. Typo: instead of in LanguageMap

Fixed the first, replaced the kbd tags with Respec IDL markup {{LanguageMap}}

@emlun Do you not have review permission on the PR?

emlun commented 2 weeks ago

Oh! I didn't realize there was a PR. All I'd seen was the link to this issue from https://github.com/w3c/webauthn/issues/2151, and #89 hasn't shown up in the activity feed in this thread. I'll post in the PR if I find anything else, but I think I'm done with my review for now.

aphillips commented 2 weeks ago

P.S. I added you to the acknowledgements. Appreciate the help!

w3c / string-meta

Add IDL for `LanguageMap` #88