Open aphillips opened 2 years ago
Hi @aphillips,
Should there be a health warning about using non-UTF-8 encodings?
We can probably add a note or something. My reading is that the "utf-8 decode" will just add replacement characters but will always succeed (even with garbage).
Should we add a note just saying something about replacement characters? Or do you mean something else by "health warning about using non-UTF-8 encodings"?
If you have an example from another spec, that would be really helpful!
The problem here is that there is no actual mention of character encoding besides the utf-8 decode
. Yes, the decode will succeed regardless of the encoding of bytes, but this interface can also be used for sending bytes. I would at least mention that failing to use UTF-8 will produce replacement characters or mojibake garbage. Perhaps:
Note that textual content is expected to use the UTF-8 character encoding. Content using a different character encoding needs to be decoded from an
arrayBuffer()
orblob()
.
PushMessageData interface https://www.w3.org/TR/push-api/#pushmessagedata-interface
In w3c/push-api#276 we asked about the inherent UTF-8 requirement for the
text
(and to a far lesser extentjson
) methods. These method's default implementation assumes that the encoding of the message's bytes are, in fact, UTF-8 if the message is to be treated as text. The I18N WG is happy that UTF-8 is the default encoding and that it is the only supported encoding. But we note that there is no mention outside of the message data interface of UTF-8 or Unicode. Other data can be sent down the wire and retrieved usingarrayBuffer
orblob
, but there is no mention of character encodings aside from the references toutf-8 decode
andutf-8 encode
in this section. So our ask is:Should there be a health warning about using non-UTF-8 encodings?
[Note: this came out of I18N WG reviewing our previous comments in our periodic review cycle]