Closed SergeyKanzhelev closed 4 months ago
What does "if somebody will read it into the Unicode" mean?
There are several words which may be used.
Limits have been re-written in #89 which removes this ambiguity. @SergeyKanzhelev, can you please check if that addresses your concern above?
Re-assigning per the discussion in the WG meeting.
Note that the text in #89 uses the term "character" in one location when it probably should stick with saying bytes:
greater than 8192 bytes, some
list-member
s MAY be dropped until the resultingbaggage-string
is 8192 characters or less.
I agree with @dyladan's comment. Care is needed because the term "character" is overloaded. We (I18N) generally use the specialized term "code point" to refer to Unicode characters.
The limit here is given in bytes (octets). While UTF-8 is a variable width encoding, the relationship of ASCII to UTF-8 is that any 7-bit ASCII byte is itself in UTF-8. This means that the length of an ASCII string in bytes is its length in UTF-8 bytes (and its length in Unicode code points).
Decoding a sequence of non-ASCII bytes using UTF-8 would not fail, but might generate replacement characters (U+FFFD
) for non-UTF-8 (and therefore non-ASCII) bytes.
In any case, it's possible to over-clarify the limit here. Measuring the limit in bytes is specific.
Reference in the limits section to the word character
was removed in #113. @aphillips @SergeyKanzhelev do you believe the current wording is sufficient to close this discussion?
The PR #113 looks good to me.
I think this can be closed
See comment: https://github.com/w3c/baggage/pull/52#pullrequestreview-630477059
We declare limits in bytes and say ASCII when we say what symbols are allowed, but if somebody will read it into the Unicode, we may want to make sure the limits will be treated as characters limits. Similar to cookie spec note: