Open brianjlacy opened 4 months ago
The attached .zip contains an html file that demonstrates the issue in greater detail, using two real emoji fonts. Here's the result:
EDIT: Turns out I made some mistakes in my highlights on the screenshot; but the demo is still illustrative.
There might be some overlap with this issue: https://github.com/w3c/csswg-drafts/issues/7921
There might be some overlap with this issue: https://github.com/w3c/csswg-drafts/issues/7921
While the ability to potentially use actual characters instead of code points is applicable here, as you could include a sequence emoji AS the emoji itself --
unicode-range: "😶🌫️"
-- I still think we need a more flexible, generic syntax that allows for both ranges and multi-codepoint sequence matching.
This is my primary concern regarding many of the current proposals I've seen. For example, I think the keywords idea is excellent:
unicode-range: emoji;
But with that case in particular, I'm not sure how it can really work without being customizable (since emojis are always being added, and implementations don't all include the same symbols) which means we'd need an adaptable syntax to define such a keyword in the first place.
Note: I'm just an observer around here, so I beg pardon for any apparent ignorance.
Request
I propose that the
unicode-range
descriptor in the @font-face rule should support matching specific Unicode character sequences. This would allow more precise control over which characters get rendered by a particular font, especially useful for cases where emoji sequences and text characters need to be handled differently.Background and Rationale
Emoji fonts are becoming increasingly common as a way for designers to customize how emojis appear on their sites. Currently, however, the
unicode-range
descriptor only supports specifying individual code points or ranges, which fails to deliver the level of control designers may require.Emoji fonts must support characters like digits in sequences (e.g., keycap emojis like 1️⃣), but not as standalone glyphs. This can cause issues when these fonts are used alongside text fonts, as digits might not render correctly if the emoji font doesn't include visible standalone versions.
This is especially important in my view because an extremely common use case for emojis, in general -- arguably the most common scenario? 🤔-- is that they are used in the flow of ordinary text, to emphasize or add emotional context. On the one hand, designers need to be able to control which characters are displayed as standard text and which are displayed using custom emojis; on the other hand, it should be possible to apply this customization to an arbitrary mix of text and emojis within a single element.
Example:
In this example, the
unicode-range
includesU+0030-0039
, which covers digits zero through nine. If "Nifty Emoji" doesn't have visible glyphs for these digits, they might not render properly, causing display issues."But wait..."
Yes, we could simply exclude these characters from the unicode-range. But now sequences that depend on them -- such as, in this case, the "keycap" emojis -- fall back as well. There is no way to treat SEQUENCES differently from single characters.
Proposed Syntax and Examples
I propose a modification to the
unicode-range
syntax in which the+
may be used to match unicode characters only when they appear within a specified sequence:Individual Sequences:
Sequence Ranges:
Keycap Emojis: The sequence U+0030-0039+FE0F covers the keycap emojis from 0️⃣ to 9️⃣, ensuring these sequences are displayed using the "Nifty Emoji" font, while standalone digits (U+0030-0039 without FE0F) can fall back to a standard text font.
Emoji Variants: Using +FE0F and +FE0E allows specifying emoji or text presentation styles, respectively. For example, U+0030+FE0F for emoji-style zero and U+0030+FE0E for text-style zero.
Complex Sequences: The sequence U+1FAF1+1F3FB-F+200D+1FAF2+1F3FB-F covers various skin tone combinations for the handshake emoji. This ensures that specific combinations are rendered correctly according to the emoji font's design. (Note the use here of the ZWJ.)
Considerations
I believe this approach is intuitive; using the
+
symbol to indicate sequences correlates with a common form seen elsewhere on the web when describing such sequences, including in official unicode documents. It naturally extends to complex sequences, such as those involving ZWJ, and accommodates ranges within sequences.It should be noted that the font-variant-emoji descriptor does not address the issue I'm hoping to address, as that must be applied at the element level. The approach I'm proposing allows for fine-grained control of how individual characters are rendered using a particular font within a single element.
I noticed a recommendation for the use of keywords in unicode-range, with an
emoji
keywork being a motivating concern. I am uncertain whether it would account for the use case I've described.Conclusion
Supporting Unicode sequences in unicode-range offers a precise, flexible way to manage font rendering, particularly for mixed content scenarios involving text and emojis. This enhancement would improve control over font usage, prevent rendering issues, and enrich the developer experience by allowing for finer typography control.
As I am, again, not an expert here, I welcome critique and alternate viewpoints; or, if there are other, more apt proposals that account for the use case(s) described, then so much the better.