whatwg / url

URL Standard
https://url.spec.whatwg.org/
Other
533 stars 139 forks source link

ContextJ (RFC 5892) is Security Theater #776

Open adraffy opened 1 year ago

adraffy commented 1 year ago

Is CheckJoiners/ContextJ set in stone or can it be debated? If so, I'd like to present some arguments.

annevk commented 1 year ago

It depends? 😊 I haven't looked into it, probably depends a lot on what it ends up meaning for ToASCII and what the arguments are.

adraffy commented 1 year ago

For a concrete example: 1F468 200D 1F4BB image

The simplest solution is that CheckJoiners should be false


For reference, I recently implemented a normalization standard for the Ethereum Name Service ecosystem. I used a combination of UTS-51 + UTS-46 + significantly safer character set (banned punctuation, parens, brackets, vocalizations, obsolete, deprecated, ancient, reversed, turned, flipped, many ligatures, etc.) + an intelligent confusable system (that isn't just a warning system: eg. rn is a footgun confusable.) Demo | Github

From my experience with the Unicode and RFC documentation, the primary source of confusion and bugs is due to the documentation itself. Many of these rules should be deprecated and the rules should be clarified and modernized.

I think WHATWG made the correct decision with AllowHyphens and finally broke away from archaic DNS rules.

I think they should do the same with CheckJoiners. If the WHATWG really wants to protect end-users, it should recommend UTS-51 RGI pre-processing and outright disallow ZW(N)J outside of emoji.