w3c / csswg-drafts

CSS Working Group Editor Drafts
https://drafts.csswg.org/
Other
4.52k stars 673 forks source link

[css-fonts-4] Suggestion: Support Unicode Character Sequences in unicode-range #10651

Open brianjlacy opened 4 months ago

brianjlacy commented 4 months ago

Note: I'm just an observer around here, so I beg pardon for any apparent ignorance.

Request

I propose that the unicode-range descriptor in the @font-face rule should support matching specific Unicode character sequences. This would allow more precise control over which characters get rendered by a particular font, especially useful for cases where emoji sequences and text characters need to be handled differently.

Background and Rationale

Emoji fonts are becoming increasingly common as a way for designers to customize how emojis appear on their sites. Currently, however, the unicode-range descriptor only supports specifying individual code points or ranges, which fails to deliver the level of control designers may require.

Emoji fonts must support characters like digits in sequences (e.g., keycap emojis like 1️⃣), but not as standalone glyphs. This can cause issues when these fonts are used alongside text fonts, as digits might not render correctly if the emoji font doesn't include visible standalone versions.

This is especially important in my view because an extremely common use case for emojis, in general -- arguably the most common scenario? 🤔-- is that they are used in the flow of ordinary text, to emphasize or add emotional context. On the one hand, designers need to be able to control which characters are displayed as standard text and which are displayed using custom emojis; on the other hand, it should be possible to apply this customization to an arbitrary mix of text and emojis within a single element.

Example:

<div id="container">
  <p class="emojified">
    This paragraph should use a custom font, "Nifty Emoji," to render standard emojis like 😁 and also emoji sequences like 1️⃣, while letting numbers (1 2 3) and punctuation (*, #, etc.) use a fallback font.
  </p>
</div>
@font-face {
  font-family: "Nifty Emoji";
  src: url("path/to/nifty-emoji.woff2") format("woff2");
  unicode-range: U+0023, U+002A, U+0030-0039, U+FE0F, /* ...etc. */;
}

p.emojified {
  font-family: 'Nifty Emoji', sans-serif;
}

In this example, the unicode-range includes U+0030-0039, which covers digits zero through nine. If "Nifty Emoji" doesn't have visible glyphs for these digits, they might not render properly, causing display issues.

"But wait..."

Yes, we could simply exclude these characters from the unicode-range. But now sequences that depend on them -- such as, in this case, the "keycap" emojis -- fall back as well. There is no way to treat SEQUENCES differently from single characters.

Proposed Syntax and Examples

I propose a modification to the unicode-range syntax in which the + may be used to match unicode characters only when they appear within a specified sequence:

  1. Individual Sequences:

    @font-face {
      font-family: "Nifty Emoji";
      src: url("path/to/nifty-emoji.woff2") format("woff2");
      /*
        Supports "text" and "emoji" style "keycap" symbols;
        but ordinary digits (0030-39) are allowed to fall back!
      */
      unicode-range: U+0023+FE0E, U+0023+FE0F, /* ... */;
    }
  2. Sequence Ranges:

    @font-face {
      font-family: "Nifty Emoji";
      src: url("path/to/nifty-emoji.woff2") format("woff2");
      /*
        Supports:
        - "text" and "emoji" style "keycap" symbols
        - "text" and "emoji" style "keycap" style '#' symbol
        - the handshake emojis in various skin tones
    
        Does NOT Support (allows to fall back to another font):
        - ordinary cardinal numbers '0' through '9'
        - ordinary '#' symbol
      */
      unicode-range: U+0030-0039+FE0F, U+0023+FE0E-F, U+1FAF1+1F3FB-F+200D+1FAF2+1F3FB-F, /* ... */;
    }

Keycap Emojis: The sequence U+0030-0039+FE0F covers the keycap emojis from 0️⃣ to 9️⃣, ensuring these sequences are displayed using the "Nifty Emoji" font, while standalone digits (U+0030-0039 without FE0F) can fall back to a standard text font.

Emoji Variants: Using +FE0F and +FE0E allows specifying emoji or text presentation styles, respectively. For example, U+0030+FE0F for emoji-style zero and U+0030+FE0E for text-style zero.

Complex Sequences: The sequence U+1FAF1+1F3FB-F+200D+1FAF2+1F3FB-F covers various skin tone combinations for the handshake emoji. This ensures that specific combinations are rendered correctly according to the emoji font's design. (Note the use here of the ZWJ.)

Considerations

Conclusion

Supporting Unicode sequences in unicode-range offers a precise, flexible way to manage font rendering, particularly for mixed content scenarios involving text and emojis. This enhancement would improve control over font usage, prevent rendering issues, and enrich the developer experience by allowing for finer typography control.

As I am, again, not an expert here, I welcome critique and alternate viewpoints; or, if there are other, more apt proposals that account for the use case(s) described, then so much the better.

brianjlacy commented 3 months ago

The attached .zip contains an html file that demonstrates the issue in greater detail, using two real emoji fonts. Here's the result:

emoji-font-face-demo emoji-font-face-demo.zip

EDIT: Turns out I made some mistakes in my highlights on the screenshot; but the demo is still illustrative.

romainmenke commented 3 months ago

There might be some overlap with this issue: https://github.com/w3c/csswg-drafts/issues/7921

brianjlacy commented 3 months ago

There might be some overlap with this issue: https://github.com/w3c/csswg-drafts/issues/7921

While the ability to potentially use actual characters instead of code points is applicable here, as you could include a sequence emoji AS the emoji itself --

unicode-range: "😶‍🌫️"

-- I still think we need a more flexible, generic syntax that allows for both ranges and multi-codepoint sequence matching.

This is my primary concern regarding many of the current proposals I've seen. For example, I think the keywords idea is excellent:

unicode-range: emoji;

But with that case in particular, I'm not sure how it can really work without being customizable (since emojis are always being added, and implementations don't all include the same symbols) which means we'd need an adaptable syntax to define such a keyword in the first place.