w3c / baggage

Propagation format for distributed context: Baggage
https://w3c.github.io/baggage/
Other
46 stars 18 forks source link

UTF-8 encoding of value should be more clearly defined #100

Closed aphillips closed 2 years ago

aphillips commented 2 years ago

3.2.1.3 value https://www.w3.org/TR/baggage/#value

A value contains UTF-8 encoded string. Any characters outside of the baggage-octet ranges of characters MUST be percent-encoded. Characters which are not required to be percent-encoded MAY be percent-encoded.

The value space appears to require that the UTF-8 character encoding form be used, but this is not explicitly stated. There is also no discussion of what happens with percent-encoded octets that do not match UTF-8. I would suggest the following edits:

A value contains a string whose character encoding MUST be UTF-8 [Encoding]. Any characters outside of the baggage-octet range of characters MUST be percent-encoded. Characters which are not required to be percent-encoded MAY be percent-encoded.

When decoding the value, percent-encoded octet sequences that do not match the UTF-8 encoding scheme MUST be replaced with the replacement character (U+FFFD).