w3c / push-api

Push API
https://w3c.github.io/push-api/
Other
144 stars 40 forks source link

Should there be a UTF-8 health warning? #335

Open aphillips opened 2 years ago

aphillips commented 2 years ago

PushMessageData interface https://www.w3.org/TR/push-api/#pushmessagedata-interface

In w3c/push-api#276 we asked about the inherent UTF-8 requirement for the text (and to a far lesser extent json) methods. These method's default implementation assumes that the encoding of the message's bytes are, in fact, UTF-8 if the message is to be treated as text. The I18N WG is happy that UTF-8 is the default encoding and that it is the only supported encoding. But we note that there is no mention outside of the message data interface of UTF-8 or Unicode. Other data can be sent down the wire and retrieved using arrayBuffer or blob, but there is no mention of character encodings aside from the references to utf-8 decode and utf-8 encode in this section. So our ask is:

Should there be a health warning about using non-UTF-8 encodings?

[Note: this came out of I18N WG reviewing our previous comments in our periodic review cycle]

marcoscaceres commented 2 years ago

Hi @aphillips,

Should there be a health warning about using non-UTF-8 encodings?

We can probably add a note or something. My reading is that the "utf-8 decode" will just add replacement characters but will always succeed (even with garbage).

Should we add a note just saying something about replacement characters? Or do you mean something else by "health warning about using non-UTF-8 encodings"?

If you have an example from another spec, that would be really helpful!

aphillips commented 2 years ago

The problem here is that there is no actual mention of character encoding besides the utf-8 decode. Yes, the decode will succeed regardless of the encoding of bytes, but this interface can also be used for sending bytes. I would at least mention that failing to use UTF-8 will produce replacement characters or mojibake garbage. Perhaps:

Note that textual content is expected to use the UTF-8 character encoding. Content using a different character encoding needs to be decoded from an arrayBuffer() or blob().