ietf-wg-httpapi / linkset

Media Types and a Link Relation Type for Link Sets
https://datatracker.ietf.org/doc/draft-ietf-httpapi-linkset/
7 stars 8 forks source link

Address feedback from I18N review #66

Closed dret closed 2 years ago

dret commented 2 years ago

https://github.com/w3c/i18n-activity/issues/1505 (thanks for the review, @aphillips!)

aphillips commented 2 years ago

Copying the W3C I18N WG comments here for your convenience

https://datatracker.ietf.org/doc/draft-ietf-httpapi-linkset/

4.1: The use of non-ASCII characters in the field value of the HTTP "Link" Header field is not allowed.

This is an accurate statement about HTTP headers in general (and link in specific), but seems out of place here. This document's format can have non-ASCII characters where an HTTP header must resort to encodings/escapes.

4.2.4.1: "hreflang": The "hreflang" target attribute, defined as optional
      and repeatable by [RFC8288], MUST be represented by an "hreflang"
      member, and its value MUST be an array (even if there only is one
      value to be represented), and each value in that array MUST be a
      string - representing one value of the "hreflang" target attribute
      for a link - which follows the same model as in the [RFC8288]
      syntax.

RFC8288 references the Language-Tag ABNF in RFC5646 (i.e. BCP47), but not BCP47's other requirements. It would also serve legibility/usefulness of this spec if BCP47 were called out directly. That is, I'd say:

... and its value MUST be an array (even if there is only one value 
to be represented). Each value in that array MUST be string containing 
a well-formed language tag [BCP47] following the same model as in [RFC8288].
ibid:    *  "title": The "title" target attribute, defined as optional and not
      repeatable by [RFC8288], MUST be represented by a "title" member
      in the link target object, and its value MUST be a string that
      follows the same model as in the [RFC8288] syntax.

RFC8288 uses %-encoded text and allows for non-UTF-8 encodings. But there is no reason for this in a JSON file. JSON itself requires UTF-8 as the file encoding, so why not just allow the title to be UTF-8?

ibid:   *  "title*": The "title*" target attribute, defined as optional and
      not repeatable by [RFC8288], is motivated by character encoding
      and language issues and follows the model defined in [RFC8187].
      The details of the JSON representation that applies to title* are
      described in Section 4.2.4.2.

This uses %-encoded text and allows any encoding (with UTF-8 as the default), but 4.2.4.2 says:

   *  The character encoding information as prescribed by [RFC8187] is
      not preserved; instead, the content of the internationalized
      attribute is represented in the character encoding used for the
      JSON set of links.

I think this should be explicit and just say UTF-8 in place of "the character encoding used for the JSON set of links"

As a meta-point, there is no discussion either here or in RFCs 8288 or 8187 about matching of language tags/ranges. In addition, the language tags for title/title* are not allowed to repeat, but it should be noted that language tags are not case-sensitive when matching. Also, there is no discussion of whether the array of hreflang values is in priority order or should be considered unordered.

dret commented 2 years ago

I think this should be explicit and just say UTF-8 in place of "the character encoding used for the JSON set of links"

JSON is not necessarily UTF-8, meaning that the character encoding will be whatever the JSON document is encoded in.

dret commented 2 years ago

As a meta-point, there is no discussion either here or in RFCs 8288 or 8187 about matching of language tags/ranges. In addition, the language tags for title/title* are not allowed to repeat, but it should be noted that language tags are not case-sensitive when matching. Also, there is no discussion of whether the array of hreflang values is in priority order or should be considered unordered.

this seems like an issue that could be added as an erratum to RFC 8288, but not like something where we should make any changes to how RFC 8288 defines things.

dret commented 2 years ago

@aphillips, do you think it would have helped to have the start of 4.2 ("JSON Document Format: application/linkset+json", https://datatracker.ietf.org/doc/html/draft-ietf-httpapi-linkset-09#section-4.2) explicitly talk about character encoding? what about adding a simple intro such as

For those values using RFC 8187 syntax, the JSON representation consists of encoding these values in the character encoding of the JSON linkset.

we would still mention this in the individual places that you also mentioned in your review, just to make sure that it's mentioned everywhere where it's relevant. but maybe adding that intro statement would help when reading through the entire document/section?

richsalz commented 2 years ago

As a general rule, I am against repetition. This is a short document. For what it's worth, in my opinion, if you added a sentence like the above to the "terminology" section of the document, that is enough to address the feedback.

aphillips commented 2 years ago

@dret

I think this should be explicit and just say UTF-8 in place of "the character encoding used for the JSON set of links"

JSON is not necessarily UTF-8, meaning that the character encoding will be whatever the JSON document is encoded in.

JSON files are always encoded into a Unicode encoding. The other encodings permitted are UTF-16 and UTF-32. See RFC4627 sections 3 and 6. In practice, no one uses these other encodings, but you needn't address that. However, it is important to notice that JSON does not use just any encoding. You might, therefore, say something like:

The character encoding information as prescribed by [RFC8187] is not preserved; instead, the content of the attribute is represented as a JSON string.

aphillips commented 2 years ago

this seems like an issue that could be added as an erratum to RFC 8288, but not like something where we should make any changes to how RFC 8288 defines things.

I think it is a gap in 8288. Case insensitivity of language tags might merit a mention in your own document, though.

aphillips commented 2 years ago

(actually, my previous comment was sent too soon :-) )

RFC 8529 prohibited the use of any encoding other than UTF-8 for JSON and this also applies to the application/json mimetype.

JSON text exchanged between systems that are not part of a closed ecosystem MUST be encoded using UTF-8 [RFC3629].

Previous specifications of JSON have not required the use of UTF-8 when transmitting JSON text. However, the vast majority of JSON- based software implementations have chosen to use the UTF-8 encoding, to the extent that it is the only encoding that achieves interoperability.

I think that trying to keep your spec "encoding agnostic" doesn't make sense. RFC 8187 and RFC 8288 are in the header space and thus have to allow for the use of arbitrary encodings. This spec is in the JSON space and thus is an a Unicode beast through-and-through. So I'd probably say at the start of 4.2 (and wherever necessary elsewhere) that "values encoded according to the RFC 8187 syntax are converted to a sequence of Unicode code points"

klensin commented 2 years ago

FWIW, I agree with Addison but let me say nearly the same thing from what may be more of an IETF perspective (and, fwiw, as the designated I18N directorate reviewer before I decided that more topic-specific expertise was needed). If this is to be published as an IETF specification, interoperability is a significant concern. Experience has repeatedly shown that the use of any Unicode encoding other than UTF-8 "on the wire" or in files that are created by one system but interpreted by another just create problems waiting to happen, problems that spill over directly into interoperability difficulties.

As Addison points out RFC 8529 seems to require Unicode and UTF-8 in this context. Exceptions to the former would make things even worse than allowing alternative encodings.

So my recommendation, like his, is to allow only Unicode encoded in UTF-8 and be done with it. If there is some reason to do something else, the document needs to explain why and under what circumstances their use would be appropriate, not just mention alternate encodings and start handwaving.

hvdsomp commented 2 years ago

Addressed the UTF-8 issue in https://github.com/ietf-wg-httpapi/linkset/commit/d59f2f5f24d76aee26dc5630d2712340ceb7fb35

hvdsomp commented 2 years ago

Addressed feedback re non-ASCII characters in application/linkset encoding in https://github.com/ietf-wg-httpapi/linkset/commit/3ca70d401f454220b7a525cc7c25ae06a5770aa4

hvdsomp commented 2 years ago

We really think that addressing the meta-comment about matching of language tags/ranges is beyond the scope of this specification and that it should be addressed in an erratum to RFC8288. Is it OK to close this issue, @aphillips ?

aphillips commented 2 years ago

@hvdsomp I'm cool with the meta-comment resolution and the other fixes look good to me, so I am satisfied and you can close this issue.

hvdsomp commented 2 years ago

Thanks for feedback @aphillips . Closing now.

klensin commented 2 years ago

--On Tuesday, April 26, 2022 10:19 -0700 Herbert Van de Sompel @.***> wrote:

Closed #66.

You wrote earlier...

We really think that addressing the meta-comment about matching of language tags/ranges is beyond the scope of this specification and that it should be addressed in an erratum to RFC8288. [...]

As a procedural matter, errata, especially to standards track specifications, don't count. They can identify editorial problems for the convenience of future readers but fixing a substantive omission or providing an important (but not obvious) clarification requires IETF consensus, not (as with errata) private agreements among authors, ADs (or even the whole IESG), and a few interested parties.

If this document needs modifications to 8288 to be clear or to avoid inconsistency with what 8288 actually says, there are, AFAIK, only two ways to do that:

Sorry about that. john

mnot commented 2 years ago

John,

If an I-D is prepared that updates 8288, there is no reason for this document to normatively reference it (or be blocked on it); the reference to 8288 along with the updates relationship should be adequate.

mnot commented 2 years ago

Also, presuming the point is about this aspect:

As a meta-point, there is no discussion either here or in RFCs 8288 or 8187 about matching of language tags/ranges. In addition, the language tags for title/title* are not allowed to repeat, but it should be noted that language tags are not case-sensitive when matching. Also, there is no discussion of whether the array of hreflang values is in priority order or should be considered unordered.

it's not yet clear that something needs to be said, nor what should be said. I am wary of creating substantial new requirements in post-last call processes; IME the text is often hurried, doesn't enjoy broad input, and more prone to errors.

klensin commented 2 years ago

--On Tuesday, April 26, 2022 15:17 -0700 Mark Nottingham @.***> wrote:

John,

If an I-D is prepared that updates 8288, there is no reason for this document to normatively reference it (or be blocked on it); the reference to 8288 along with the updates relationship should be adequate.

Mark,

That would certainly be the case if the only thing this discussion has done via-a-vis 8288 is to point out a deficiency that is irrelevant to the I-D. My impression from the recent discussion, however, is that this I-D depends on 8288 having these additional provisions or being interpreted consistent with them. If that is actually the case, then the existing reference to 8288 is incomplete without the update and this document needs to be blocked on its becoming final (and, specifically, not on handwaving about errata).

john

mnot commented 2 years ago

John,

You're making suppositions about what the issue might be, and it hasn't even been well-described yet, much less brought in front of the WG or the IESG. Given the extreme delays that this document has already seen, I'd appreciate it if you avoid suggesting such disruptive measures based upon mere impressions.

aphillips commented 2 years ago

@mnot

Also, presuming the point is about this aspect:

As a meta-point, there is no discussion either here or in RFCs 8288 or 8187 about matching of language tags/ranges. In addition, the language tags for title/title* are not allowed to repeat, but it should be noted that language tags are not case-sensitive when matching. Also, there is no discussion of whether the array of hreflang values is in priority order or should be considered unordered.

it's not yet clear that something needs to be said, nor what should be said. I am wary of creating substantial new requirements in post-last call processes; IME the text is often hurried, doesn't enjoy broad input, and more prone to errors.

Matching of the language tags is completely up to consumers across all three documents (this I-D plus 8187/8288). A case can be made that this is a Good Thing? Implementations are provided with metadata which they can use however they like to do language negotiation, selection, or whatnot. In any case, the matching algorithm shouldn't be defined here if it is not defined in 8187 or 8288.

The case-insensitivity part is dealt with in 8187:

Note that both character encoding names and language tags are restricted to the US-ASCII coded character set and are matched case- insensitively (see Section 2.3 of [RFC2978] and Section 2.1.1 of [RFC5646]).

It wouldn't be amiss to call out the case-insensitivity in this document, I think, because it's not obvious, but it's not a requirement by any means and no errata are needed since it's already in the document(s) in question.

hvdsomp commented 2 years ago

Addressed "matching language tag"issue in https://github.com/ietf-wg-httpapi/linkset/commit/092bc61ea834da58e5792062902471541a2dde31

richsalz commented 2 years ago

Let's assume that everyone is working with the best intentions, and let's recognize that email is a poor communication scheme, especially when multiple parties are involved who don't all share the same one-on-one history.

Addison, an I18N reviewer, said there might be an issue and suggested putting some additional text. The thread meandered a bit, diverging into perhaps the base RFC's are wrong and if so do an errata.

John, also an I18N reviewer dragged into this later, says, well that if the base RFC's are wrong, we can't fix that in an errata, and if you try to do that it's something that could be appealed. John has a long history of IETF processes, and fighting the initial uphill battles to get IETF protocols to move beyond US-ASCII.

Addison clarified that the base RFCs aren't wrong, and in fact it might be a good thing.

Through no fault of John, Addison, or the draft authors (those directly affected), these I18N expert reviews came really late into the process. There was some discussion, some give-and-take, and some wording changes were made.

The document authors have been jerked around by reviews. The Chairs know this, our AD knows this, but John and Addison do not. They are so battle-scared by the process that they are seriously reconsidering their involvement in the IETF. That would be a loss to everyone, as the APIs-over-HTTP community recognizes the need for standardizing some behavior, and the IETF is a good place to do it. If the authors, and others in the HTTP API community, feel the IETF is just not worth it, everyone loses.

The final change is editorial, not substantive, and needs no WG review. @hvdsomp please merge it. Thank you waiting.

The ball is now in the hands of Francesca, our AD. I believe she appreciates how close to the end of their limits many of the people mentioned here (including me) are. I certainly hope John doesn't appeal, as I firmly believe all it will do is consume IESG time that could be better spent on things like thinking about the review process.

I am taking the unusual step of locking this thread so that there can be no more comments.