w3ctag / design-principles

A small-but-growing set of design principles collected by the TAG while reviewing specifications
https://w3ctag.github.io/design-principles
170 stars 44 forks source link

Encourage UTF-8 for new formats and APIs #322

Open domenic opened 3 years ago

domenic commented 3 years ago

A lot of the new formats and APIs we've been designing (and some not-so-new) assume UTF-8 unconditionally. These include:

We also made it non-conforming for HTML documents to use any other encoding. And, Encoding tries to be clear that everything else is legacy.

It'd be good if this was captured in the design principles doc. https://w3ctag.github.io/design-principles/#new-data-formats is one place, that captures several of the above examples. There might be room for some separate guidance on APIs (not just formats), to capture the text() and responseText examples: basically, any time an API is interpreting some unknown bytes as a string, it should just assume it's always UTF-8.

annevk commented 3 years ago

XMLHttpRequest & fetch()'s JSON utilities and WebVTT come to mind as well.

dcodeIO commented 3 years ago

That this is used as a justification for what Wasm does in between function calls is just surprising. I mean, networking, file formats, fine, but breaking with JS is worrisome at best.

annevk commented 3 years ago

Can you explain how any of this breaks with JavaScript?

dcodeIO commented 3 years ago

I am specifically referring to the "new APIs" part. This design principle has been biting Wasm since December 2017, as it is regularly used as an argument to disallow surrogates on boundaries. Trapping, or replacing, on code that intentionally does not produce errors for backwards-compatibility reasons, in languages that make it so very easy to substring pairs with constants, goes way too far and is a huge mistake.

annevk commented 3 years ago

Why is it a huge mistake? If it was a huge mistake wouldn't we have noticed it being a problem in the past decade or so?

dcodeIO commented 3 years ago

Are you saying that since you didn't notice, exactly because it is currently intentionally allowed so it does not lead to errors, that the opposite of a legitimate argument must be true?

annevk commented 3 years ago

I don't think so. It's still not entirely clear to me what exactly you find problematic and for what reason. Examples would help.

dcodeIO commented 3 years ago
let myString = inputString.substring(0, 10); // user finds it funny to place an emoji at 9
map.set(myString, 42);

let alsoMyString = roundtripStringOverInterfaceTypesBoundaryButWhoKnows(myString);

map.get(alsoMyString) // undefined
if (myString == alsoMyString) {
  // false
}
let myString = getStringFromDatabaseOverInterfaceTypesBoundaryButWhoKnows();
queryWhere("stringInDatabase = ", myString); // no results
updateWhere("stringInDatabase = ", myString); // no update, update wrong row or error
deleteWhere("stringInDatabase = ", myString); // no delete, delete wrong row or error
annevk commented 3 years ago

That seems rather contrived and will already fail the moment you involve the network or URLs.

dcodeIO commented 3 years ago

This has to stop! cc @cynthia @darobin @dbaron @domenic @hadleybeeman @LeaVerou @lknik @mnot @plinss @rubys @slightlyoff @timbl @torgo @twirl @wycats @ylafon @BrendanEich

dcodeIO commented 3 years ago

Applying this design principle to Wasm is literally breaking the Web Platform, wrecking WebAssembly, damaging the reputation of the W3C and everyone involved, and I do not think that a bunch of thumbs down and marking my comments as spam is very helpful in this regard. This is not only highly security relevant as it may eventually affect any arbitrary amount of code ever written in JavaScript, JavaScript-likes, C# and Java, but in the worst case puts people's lives in serious danger when they rely on correctly functioning software that doesn't kill them just because someone put an emoji in the wrong place.

plinss commented 3 years ago

@dcodeIO your comment wasn't marked as spam because people disagree with you, it was marked as spam because it was not constructive and you needlessly tagged a large number of people.

Engaging in hyperbole isn't constructive either.

Please constrain your comments to technical discussion of the matter at hand. @annevk has been engaging with you in good faith and you are not doing him, or anyone else, the courtesy of the same. At this point I also suggest you read the W3C Code of Ethics and Professional Conduct.

dcodeIO commented 3 years ago

I do not see how I am violating the CEPC. I disagree technically of course, strongly even, but that doesn't imply hostility on my end. I just think this is extremely important, and I'd rather question the reactions I have received on such an important matter.