[css-syntax] Missing emoji in `non-ascii` identifier codepoints

w3c / csswg-drafts

CSS Working Group Editor Drafts

https://drafts.csswg.org/

Other

4.46k stars 657 forks source link

[css-syntax] Missing emoji in `non-ascii` identifier codepoints #11005

Open Conaclos opened 3 days ago

Conaclos commented 3 days ago

Type of proposal: enhancement

The non-ASCII ident code point specification doesn't include the Miscellaneous Symbols Unicode block, and the Dingbats Unicode block. However, the emoji of these blocks are allowed by browsers.

Note: should we also accept some of the emoji of the Miscellaneous Technical Unicode block? Or should we even accept all non-ASCII characters?

These codepoints (0+2600 to U+27BF included) could be added to the non-ASCII ident code point specification.

tabatkins commented 2 days ago

As stated in the spec's note, I just matched HTML's set of valid custom element name characters. We want CSS's idents to at least cover that set, so authors don't have to use escapes when writing selectors to target their custom elements, but we could be a superset.

I'm checking with the HTML editors to see if they remember why these specific ranges were chosen, and if there's a good reason to avoid allowing those emojis. (cc @annevk @domenic )

It does indeed seem a little silly that --🥔 is a valid custom property name, but --✨ isn't.

tabatkins commented 2 days ago

Or should we even accept all non-ASCII characters?

Not all; again, as stated in the note, there's good reasons to exclude some characters from idents, and Unicode itself even recommends disallowing some characters that are allowed by the current spec. But the emojis seem probably safe.

annevk commented 2 days ago

We actually want to turn it into a blocklist of sorts: https://github.com/whatwg/dom/pull/1079. The current restrictions follow from XML (which the DOM APIs build on and with which we wanted to be compatible): https://www.w3.org/TR/REC-xml/#NT-NameStartChar

I think it would be okay for CSS to essentially have one or more ASCII alpha or U+0080 through U+10FFFF (and maybe some other ASCII code points?), as long as it starts with two hyphens. No need for HTML parity.