Closed ferdnyc closed 3 weeks ago
Interestingly, "Fallback" appears to function... well, ''differently'', if I select a non-emoji font, like Symbola, which does contain some Emoji characters. Then, activating Fallback does replace only the missing glyphs with their emoji presentations:
But that's actually kind of super weird, and I'm not sure it's how fallback is really supposed to work.
(Unicode TR51 defines ''fallback'' presentations for Emoji, but they're something different than font-fallback. One of the definitions they give for an emoji fallback presentation involves displaying a composed emoji as the individual emoji that make up the sequence, instead of the product of their composition.
For example, the :rainbow_flag: emoji is formed by composing the :white_flag: emoji and the :rainbow: emoji together using a Zero-Width Joiner. In implementations where :rainbow_flag: is unavailable, the fallback presentation would be to display :white_flag::rainbow:.)
@ferdnyc
Test file to reproduce the problem only using pango:
$ cat ~/pango-test.txt
<span font="Twemoji 48" fallback="false">🤯😀🫨</span> Twemoji, fallback=false
<span font="Twemoji 48" fallback="true">🤯😀🫨</span> Twemoji, fallback=true
<span font="Symbola 48" fallback="false">😇︎︎</span> Symbola, fallback=false
<span font="Symbola 48" fallback="true">😇︎︎</span> Symbola, fallback=true
$
Running pango-view --markup ~/pango-test.txt
gives the following result:
When fallback is true, even the emoji for which glyphs are are available in the Twemoji font are replaced by glyphs from "Noto Color Emoji". But it the main font is Symbola, then the fallback to "Noto Color Emoji" happens only for the glyphs which are really lacking in the Symbola font.
@ferdnyc
This used to work just as you expected, a few years ago. I also agree that this would be the correct behaviour.
I had already noticed a while ago that it didn't work anymore as it used to but had no time to investigate and then forgot about it again.
I didn't change the code in emoji-picker at all so I suspect that something either in Pango or fontconfig has changed.
emoji-picker does the same as shown in the test file above:
<span font="fontname size" fallback="true">some emoji</span>
And this does not work anymore as it used to work.
Now I need to find out whether this is because of a change in Pango or in fontconfig ...
Indeed, it does sound like Pango is the culprit.
I wonder if this is somehow related to your old bug https://gitlab.gnome.org/GNOME/pango/-/issues/289, or the related https://gitlab.gnome.org/GNOME/pango/-/issues/298 — both of which are still open, though the first one is at least somewhat addressed since your original report was:
With pango 1.40.12 and fontconfig from git master, it is not possible to choose the font used for emoji from pango anymore.
And obviously that's no longer the case. (Except when fallback is activated. Sometimes.)
Hmf. Interestingly, on my Fedora 40 system I get this, when running the test file through pango-view --markup
:
Hmm... probably because activating Fallback in Emoji Picker with Symbola selected doesn't fill in that glyph, either. I guess my default emoji font is missing it.
But this one works:
Hmf. Interestingly, on my Fedora 40 system I get this, when running the test file through
pango-view --markup
:
Ah, Fedora 40 is still on Noto Color Emoji 20231130. I updated to the 20241008 version from Rawhide, and all is well:
Indeed, it does sound like Pango is the culprit.
I wonder if this is somehow related to your old bug https://gitlab.gnome.org/GNOME/pango/-/issues/289, or the related https://gitlab.gnome.org/GNOME/pango/-/issues/298 — both of which are still open, though the first one is at least somewhat addressed since your original report was:
With pango 1.40.12 and fontconfig from git master, it is not possible to choose the font used for emoji from pango anymore.
And obviously that's no longer the case. (Except when fallback is activated. Sometimes.)
Both bugs don't seem to be resolved in a way which I would need for emoji picker. Especially the second one does not seem to have been addressed at all.
Of course I want the old behaviour back from the time when I first implemented the font selection and the fallback option: I want to show as many glyphs as possible with the selected fonts and use fallback only for the glyphs which the selected font lacks. That would give the most useful information to the user, one would see easily which emoji are available in a font and which other font(s) one could use for the missing emoji.
So I would like to have that behaviour back.
I wonder whether there is any way to force that behaviour with the current pango and fontconfig ...
U+1FAE8 🫨 shaking face U+1F92F 🤯 shocked face with exploding head
$ fc-match "Twemoji:lang=und-zsye:charset=1fae8"
NotoColorEmoji.ttf: "Noto Color Emoji" "Regular"
$ fc-match "Twemoji:lang=und-zsye:charset=1f92f"
Twemoji.ttf: "Twemoji" "Regular"
So fontconfig’s fc-match
does give us TWemoji
when we request TWemoji
and a codepoint for which TWemoji
has a glyph. And falls back to Noto Color Emoji
only when Twemoji
lacks a glyph for the code point.
And then Pango overrides that result?
If I request the generic emoji
family or no family at all, I always get Noto Color Emoji
:
$ fc-match "emoji:lang=und-zsye:charset=1f92f"
NotoColorEmoji.ttf: "Noto Color Emoji" "Regular"
$ fc-match "emoji:lang=und-zsye:charset=1fae8"
NotoColorEmoji.ttf: "Noto Color Emoji" "Regular"
$ fc-match ":lang=und-zsye:charset=1fae8"
NotoColorEmoji.ttf: "Noto Color Emoji" "Regular"
@ferdnyc I think theoretically I could implement a workaround as follows (only for emoji which are single code points!):
For each emoji to be displayed
if the requested font has the emoji
use <span font="font size" fallback="false">emoji</span>
i.e. use fallback="false" always, no matter what the checkbox says.
else
use fallback as chosen by the checkbox
Whether a font has an emoji could be checked with fontconfig:
$ fc-list "Twemoji:lang=und-zsye:charset=1fae8"
$
No result, that means Twemoji does not have U+1FAE8.
$ fc-list "Twemoji:lang=und-zsye:charset=1f92f"
/usr/share/fonts/twemoji/Twemoji.ttf: Twemoji:style=Regular
$
Here we have a result, that means Twemoji does have U+1F92F.
This way I could avoid getting a fallback when is not necessary because the requested font does have that glyph.
But there are several problems with that idea:
I would probably need to write my own Python interface to fontconfig (it looks like https://pypi.org/project/Python-fontconfig/ does not do what I would need)
it might cause a significant slowdown emoji-picker when displaying a page like the “people” category which has 606 emoji at the moment if I need to do this extra check for every emoji
It still doesn't solve the problem for emoji which are not single code points but sequences, with fontconfig I can only check whether a font has a glyph for a codepoint
For example consider this emoji sequence:
🙂↕️ U+1F642 U+200D U+2195 U+FE0F “head shaking vertically”
Checking with fontconfig
$ fc-list "Twemoji:charset=1f642"
/usr/share/fonts/twemoji/Twemoji.ttf: Twemoji:style=Regular
$ fc-list "Twemoji:charset=200d"
/usr/share/fonts/twemoji/Twemoji.ttf: Twemoji:style=Regular
$ fc-list "Twemoji:charset=2195"
/usr/share/fonts/twemoji/Twemoji.ttf: Twemoji:style=Regular
$ fc-list "Twemoji:charset=fe0f"
$
So Twemoji has all the code points of the emoji (I think I could ignore whether a font has U+200D ZERO WIDTH JOINER or U+FE0F VARIATION SELECTOR-16, a font does not need to have glyphs for unprintable characters like these).
But even if I know that Twemoji does have glyphs for all emoji such a sequence is composed of, I still don’t know whether the font has a glyph for the whole sequence. fontconfig cannot answer this.
So I still don’t know what I could do here, I have no good idea for a workaround.
@mike-fabian
Whether a font has an emoji could be checked with fontconfig:
$ fc-list "Twemoji:lang=und-zsye:charset=1fae8" $
No result, that means Twemoji does not have U+1FAE8.
$ fc-list "Twemoji:lang=und-zsye:charset=1f92f" /usr/share/fonts/twemoji/Twemoji.ttf: Twemoji:style=Regular $
Here we have a result, that means Twemoji does have U+1F92F.
This way I could avoid getting a fallback when is not necessary because the requested font does have that glyph.
But there are several problems with that idea:
I would probably need to write my own Python interface to fontconfig (it looks like https://pypi.org/project/Python-fontconfig/ does not do what I would need)
it might cause a significant slowdown emoji-picker when displaying a page like the “people” category which has 606 emoji at the moment if I need to do this extra check for every emoji
Well, I do have a workaround for part of that issue.
The same way fontconfig can be used to match fonts based on parameters like charset
, it can also be used to query a font's available charset
s.
If you use fc-list
to get the path to a given font, you can fc-query
that file to extract its data. And with a custom format (courtesy of the FcPatternFormat(3)
syntax), properties can be expanded.
The list of all glyphs present in Twemoji
, for example, is:
$ fc-query $(fc-list -f '%{file}' Twemoji) -f '%{[]charset{%{charset}}}'
20 23 2a 30-39 a9 ae 200d 203c 2049 20e3 2122 2139 2194-2199 21a9-21aa 231a-231b 2328 23cf 23e9-23f3 23f8-23fa 24c2 25aa-25ab 25b6 25c0 25fb-25fe 2600-2604 260e 2611 2614-2615 2618 261d 2620 2622-2623 2626 262a 262e-262f 2638-263a 2640 2642 2648-2653 265f-2660 2663 2665-2666 2668 267b 267e-267f 2692-2697 2699 269b-269c 26a0-26a1 26a7 26aa-26ab 26b0-26b1 26bd-26be 26c4-26c5 26c8 26ce-26cf 26d1 26d3-26d4 26e9-26ea 26f0-26f5 26f7-26fa 26fd 2702 2705 2708-270d 270f 2712 2714 2716 271d 2721 2728 2733-2734 2744 2747 274c 274e 2753-2755 2757 2763-2764 2795-2797 27a1 27b0 27bf 2934-2935 2b05-2b07 2b1b-2b1c 2b50 2b55 3030 303d 3297 3299 e50a 1f004 1f0cf 1f170-1f171 1f17e-1f17f 1f18e 1f191-1f19a 1f1e6-1f1ff 1f201-1f202 1f21a 1f22f 1f232-1f23a 1f250-1f251 1f300-1f321 1f324-1f393 1f396-1f397 1f399-1f39b 1f39e-1f3f0 1f3f3-1f3f5 1f3f7-1f4fd 1f4ff-1f53d 1f549-1f54e 1f550-1f567 1f56f-1f570 1f573-1f57a 1f587 1f58a-1f58d 1f590 1f595-1f596 1f5a4-1f5a5 1f5a8 1f5b1-1f5b2 1f5bc 1f5c2-1f5c4 1f5d1-1f5d3 1f5dc-1f5de 1f5e1 1f5e3 1f5e8 1f5ef 1f5f3 1f5fa-1f64f 1f680-1f6c5 1f6cb-1f6d2 1f6d5-1f6d7 1f6dd-1f6e5 1f6e9 1f6eb-1f6ec 1f6f0 1f6f3-1f6fc 1f7e0-1f7eb 1f7f0 1f90c-1f93a 1f93c-1f945 1f947-1f9ff 1fa70-1fa74 1fa78-1fa7c 1fa80-1fa86 1fa90-1faac 1fab0-1faba 1fac0-1fac5 1fad0-1fad9 1fae0-1fae7 1faf0-1faf6 e0030-e0039 e0061-e007a e007f fe4e5-fe4ee fe82c fe82e-fe837
(Presented in a compact format that still forces you to parse out and expand ranges, but at least it's only a SINGLE call that will — with sufficient processing and expansion — net you the "needs fallback" state of every [single-codepoint] emoji in one fell swoop, rather than having to make 600+ separate calls.)
Still doesn't even begin to address the ZWJ-combined emoji issue; it feels like that information has to be stored SOMEWHERE in the font data, but I'm at a loss for where/how it would even be stored, never mind queried.
(For that matter, how are the glyphs for those combined emoji stored and accessed, when the appropriate sequence of code points has been encountered in a string and needs to be rendered?)
Oh, actually you don't even need that complex expansion formatting — turns out it doesn't do anything.
This:
$ fc-query $(fc-list -f '%{file}' Twemoji) -f '%{[]charset{%{charset}}}' 20 23 2a 30-39 a9 ae 200d 203c 2049 20e3 2122 2139 2194-2199 21a9-21aa 231a-231b 2328 23cf 23e9-23f3 23f8-23fa 24c2 25aa-25ab 25b6 25c0 25fb-25fe 2600-2604 260e 2611 2614-2615 2618 261d 2620 2622-2623 2626 262a 262e-262f 2638-263a 2640 2642 2648-2653 265f-2660 2663 2665-2666 2668 267b 267e-267f 2692-2697 2699 269b-269c 26a0-26a1 26a7 26aa-26ab 26b0-26b1 26bd-26be 26c4-26c5 26c8 26ce-26cf 26d1 26d3-26d4 26e9-26ea 26f0-26f5 26f7-26fa 26fd 2702 2705 2708-270d 270f 2712 2714 2716 271d 2721 2728 2733-2734 2744 2747 274c 274e 2753-2755 2757 2763-2764 2795-2797 27a1 27b0 27bf 2934-2935 2b05-2b07 2b1b-2b1c 2b50 2b55 3030 303d 3297 3299 e50a 1f004 1f0cf 1f170-1f171 1f17e-1f17f 1f18e 1f191-1f19a 1f1e6-1f1ff 1f201-1f202 1f21a 1f22f 1f232-1f23a 1f250-1f251 1f300-1f321 1f324-1f393 1f396-1f397 1f399-1f39b 1f39e-1f3f0 1f3f3-1f3f5 1f3f7-1f4fd 1f4ff-1f53d 1f549-1f54e 1f550-1f567 1f56f-1f570 1f573-1f57a 1f587 1f58a-1f58d 1f590 1f595-1f596 1f5a4-1f5a5 1f5a8 1f5b1-1f5b2 1f5bc 1f5c2-1f5c4 1f5d1-1f5d3 1f5dc-1f5de 1f5e1 1f5e3 1f5e8 1f5ef 1f5f3 1f5fa-1f64f 1f680-1f6c5 1f6cb-1f6d2 1f6d5-1f6d7 1f6dd-1f6e5 1f6e9 1f6eb-1f6ec 1f6f0 1f6f3-1f6fc 1f7e0-1f7eb 1f7f0 1f90c-1f93a 1f93c-1f945 1f947-1f9ff 1fa70-1fa74 1fa78-1fa7c 1fa80-1fa86 1fa90-1faac 1fab0-1faba 1fac0-1fac5 1fad0-1fad9 1fae0-1fae7 1faf0-1faf6 e0030-e0039 e0061-e007a e007f fe4e5-fe4ee fe82c fe82e-fe837
is actually identical to this:
$ fc-query $(fc-list -f '%{file}' Twemoji) -f '%{charset}' |fmt
20 23 2a 30-39 a9 ae 200d 203c 2049 20e3 2122 2139 2194-2199 21a9-21aa
231a-231b 2328 23cf 23e9-23f3 23f8-23fa 24c2 25aa-25ab 25b6 25c0 25fb-25fe
2600-2604 260e 2611 2614-2615 2618 261d 2620 2622-2623 2626 262a 262e-262f
2638-263a 2640 2642 2648-2653 265f-2660 2663 2665-2666 2668 267b 267e-267f
2692-2697 2699 269b-269c 26a0-26a1 26a7 26aa-26ab 26b0-26b1 26bd-26be
26c4-26c5 26c8 26ce-26cf 26d1 26d3-26d4 26e9-26ea 26f0-26f5 26f7-26fa
26fd 2702 2705 2708-270d 270f 2712 2714 2716 271d 2721 2728 2733-2734
2744 2747 274c 274e 2753-2755 2757 2763-2764 2795-2797 27a1 27b0 27bf
2934-2935 2b05-2b07 2b1b-2b1c 2b50 2b55 3030 303d 3297 3299 e50a 1f004
1f0cf 1f170-1f171 1f17e-1f17f 1f18e 1f191-1f19a 1f1e6-1f1ff 1f201-1f202
1f21a 1f22f 1f232-1f23a 1f250-1f251 1f300-1f321 1f324-1f393 1f396-1f397
1f399-1f39b 1f39e-1f3f0 1f3f3-1f3f5 1f3f7-1f4fd 1f4ff-1f53d 1f549-1f54e
1f550-1f567 1f56f-1f570 1f573-1f57a 1f587 1f58a-1f58d 1f590 1f595-1f596
1f5a4-1f5a5 1f5a8 1f5b1-1f5b2 1f5bc 1f5c2-1f5c4 1f5d1-1f5d3 1f5dc-1f5de
1f5e1 1f5e3 1f5e8 1f5ef 1f5f3 1f5fa-1f64f 1f680-1f6c5 1f6cb-1f6d2
1f6d5-1f6d7 1f6dd-1f6e5 1f6e9 1f6eb-1f6ec 1f6f0 1f6f3-1f6fc 1f7e0-1f7eb
1f7f0 1f90c-1f93a 1f93c-1f945 1f947-1f9ff 1fa70-1fa74 1fa78-1fa7c
1fa80-1fa86 1fa90-1faac 1fab0-1faba 1fac0-1fac5 1fad0-1fad9 1fae0-1fae7
1faf0-1faf6 e0030-e0039 e0061-e007a e007f fe4e5-fe4ee fe82c fe82e-fe837
(With wrapping added, this time, to keep things readable.)
$ fc-query $(fc-list -f '%{file}' Twemoji) -f '%{charset}' |fmt
That is a good idea, thank you very much!
But I still wonder whether implementing a workaround which only works for single code point emoji makes sense.
Should I do that? Maybe it is better than nothing.
Of course I would very much prefer to fix this for the emoji sequences as well but I have no idea how I could do that at the moment.
Still doesn't even begin to address the ZWJ-combined emoji issue; it feels like that information has to be stored SOMEWHERE in the font data, but I'm at a loss for where/how it would even be stored, never mind queried.
(For that matter, how are the glyphs for those combined emoji stored and accessed, when the appropriate sequence of code points has been encountered in a string and needs to be rendered?)
I don’t know how exactly that works either at the moment.
I made some limited progress using this:
from typing import List
from typing import Tuple
from typing import Dict
from typing import Any
import sys
from gi import require_version # type: ignore
require_version('Gtk', '3.0')
from gi.repository import Gtk # type: ignore
require_version('Pango', '1.0')
from gi.repository import Pango
def get_fonts_used_for_text(
font: str, text: str, fallback: bool = True) -> List[Tuple[str, Dict[str, Any]]]:
'''Return a list of fonts which were really used to render a text
:param font: The font requested to render the text in
:param text: The text to render
:param fallback: Whether to enable font fallback. If disabled, then
glyphs will only be used from the closest matching
font on the system. No fallback will be done to other
fonts on the system that might contain the glyphs needed
for the text.
Examples:
(Don’t run CI checks regularly on these examples, it depends too much
on the fonts installed on the system used to do the test}
>>> get_fonts_used_for_text('DejaVu Sans Mono', '😀 ')
[('😀', {'font': 'Noto Color Emoji', 'glyphcount': 1}), (' ', {'font': 'DejaVu Sans Mono', 'glyphcount': 1})]
>>> get_fonts_used_for_text('DejaVu Sans', '日本語 नमस्ते')
[('日本語 ', {'font': 'Droid Sans Fallback', 'glyphcount': 4}), ('नमस्ते', {'font': 'FreeSans', 'glyphcount': 5})]
>>> get_fonts_used_for_text('DejaVu Sans', '日本語 🕉️')
[('日本語 ', {'font': 'Droid Sans Fallback', 'glyphcount': 4}), ('🕉️', {'font': 'Noto Color Emoji', 'glyphcount': 1})]
>>> get_fonts_used_for_text('Twemoji', '🙂↕️')
[('🙂\u200d↕️', {'font': 'Noto Color Emoji', 'glyphcount': 1})]
>>> get_fonts_used_for_text('Twemoji', '🙂↕️', fallback=False)
[('🙂\u200d↕️', {'font': 'Twemoji', 'glyphcount': 3})]
“Twemoji” has no glyph for the flag of Sark (added in Unicode 16.0) but “Noto Color Emoji” has it.
Even though “Twemoji” has no glyph for the flag of Sark, Pango renders the sequence of two
code points (U+1F1E8 U+1F1F) as one glyph when “Twemoji” is specified and fallback is not allowed
(Visually the glyph shown appears empty, there is no “Tofu”):
>>> get_fonts_used_for_text('Twemoji', '🇨🇶', fallback=False)
[('🇨🇶', {'font': 'Twemoji', 'glyphcount': 1})]
>>> get_fonts_used_for_text('Twemoji', '🇨🇶')
[('🇨🇶', {'font': 'Noto Color Emoji', 'glyphcount': 1})]
“Twemoji” does not have the glyph for this single code point emoji but “Noto Color Emoji” has it
(visually the glyph shown when Twemoji is used is a “Tofu” block with the code point inside):
>>> get_fonts_used_for_text('Twemoji', '', fallback=False)
[('\U0001fae9', {'font': 'Twemoji', 'glyphcount': 1})]
>>> get_fonts_used_for_text('Twemoji', '', fallback=True)
[('\U0001fae9', {'font': 'Noto Color Emoji', 'glyphcount': 1})]
Both “Twemoji” and “Noto Color Emoji” have the glyph for this single code point emoji,
both render it well when inspected visually:
>>> get_fonts_used_for_text('Twemoji', '🤥', fallback=False)
[('🤥', {'font': 'Twemoji', 'glyphcount': 1})]
>>> get_fonts_used_for_text('Twemoji', '🤥', fallback=True)
[('🤥', {'font': 'Noto Color Emoji', 'glyphcount': 1})]
'''
fonts_used = []
text_utf8 = text.encode('UTF-8', errors='replace')
label = Gtk.Label()
pango_context = label.get_pango_context()
pango_layout = Pango.Layout(pango_context)
pango_font_description = Pango.font_description_from_string(font)
pango_layout.set_font_description(pango_font_description)
pango_attr_list = Pango.AttrList()
pango_attr_fallback = Pango.attr_fallback_new(fallback)
pango_attr_list.insert(pango_attr_fallback)
pango_layout.set_attributes(pango_attr_list)
pango_layout.set_text(text)
pango_layout_line = pango_layout.get_line_readonly(0)
gs_list = pango_layout_line.runs
number_of_runs = len(gs_list)
for glyph_item in gs_list:
pango_item = glyph_item.item
offset = pango_item.offset
length = pango_item.length
_num_chars = pango_item.num_chars
pango_glyph_string = glyph_item.glyphs
num_glyphs = pango_glyph_string.num_glyphs
pango_analysis = pango_item.analysis
pango_font = pango_analysis.font
font_description_used = pango_font.describe()
run_text = text_utf8[offset:offset + length].decode('UTF-8', errors='replace')
run_family = font_description_used.get_family()
fonts_used.append((run_text, {'font': run_family, 'glyphcount': num_glyphs}))
return fonts_used
def _init() -> None:
'''Initialization'''
return
def _del() -> None:
'''Cleanup'''
return
class __ModuleInitializer: # pylint: disable=too-few-public-methods,invalid-name
def __init__(self) -> None:
_init()
def __del__(self) -> None:
return
if __name__ == "__main__":
import doctest
(FAILED, _ATTEMPTED) = doctest.testmod()
sys.exit(FAILED)
As you can see in the comments I can now detect how many glyphs were used to render an emoji, i.e. I can detect for some sequences that they are not supported by a font if they render with more than one glyph:
>>> get_fonts_used_for_text('Twemoji', '🙂↕️')
[('🙂\u200d↕️', {'font': 'Noto Color Emoji', 'glyphcount': 1})]
>>> get_fonts_used_for_text('Twemoji', '🙂↕️', fallback=False)
[('🙂\u200d↕️', {'font': 'Twemoji', 'glyphcount': 3})]
But there are sequences where this “trick” doesn’t work:
“Twemoji” has no glyph for the flag of Sark (added in Unicode 16.0) but “Noto Color Emoji” has it.
Even though “Twemoji” has no glyph for the flag of Sark, Pango renders the sequence of two
code points (U+1F1E8 U+1F1F) as one glyph when “Twemoji” is specified and fallback is not allowed
(Visually the glyph shown appears empty, there is no “Tofu”):
>>> get_fonts_used_for_text('Twemoji', '🇨🇶', fallback=False)
[('🇨🇶', {'font': 'Twemoji', 'glyphcount': 1})]
>>> get_fonts_used_for_text('Twemoji', '🇨🇶')
[('🇨🇶', {'font': 'Noto Color Emoji', 'glyphcount': 1})]
So now I wonder how I can detect whether a glyphs is empty.
Also, in case of a single code point, when fallback is not allowed, and a font which does not have that glyph is used, Pango still renders it using one glyph, but that glyph is a “Tofu” replacement glyph (a box with the code point inside):
“Twemoji” does not have the glyph for this single code point emoji but “Noto Color Emoji” has it
(visually the glyph shown when Twemoji is used is a “Tofu” block with the code point inside):
>>> get_fonts_used_for_text('Twemoji', '', fallback=False)
[('\U0001fae9', {'font': 'Twemoji', 'glyphcount': 1})]
>>> get_fonts_used_for_text('Twemoji', '', fallback=True)
[('\U0001fae9', {'font': 'Noto Color Emoji', 'glyphcount': 1})]
So I need to detect empty and Tofu glyphs somehow ...
@mike-fabian
One option to make the doctests universal/reproducible, at the cost of (admittedly) a good deal of detail, would be to only return a boolean value indicating whether the font used was the one requested, rather than its exact identity.
On my system, I hit some failures with the original code — as your own comments indicated would likely be the case — when the fallback font chosen didn't match what was selected on your system. But this version passes with flying colors, by eliminating the dependence on exact font identities:
#!/bin/env python3
from typing import List
from typing import Tuple
from typing import Dict
from typing import Any
import sys
from gi import require_version # type: ignore
require_version('Gtk', '3.0')
from gi.repository import Gtk # type: ignore
require_version('Pango', '1.0')
from gi.repository import Pango
def get_fonts_used_for_text(
font: str, text: str, fallback: bool = True) -> List[Tuple[str, Dict[str, Any]]]:
'''Return a list of fonts which were really used to render a text
:param font: The font requested to render the text in
:param text: The text to render
:param fallback: Whether to enable font fallback. If disabled, then
glyphs will only be used from the closest matching
font on the system. No fallback will be done to other
fonts on the system that might contain the glyphs needed
for the text.
Examples:
(Don’t run CI checks regularly on these examples, it depends too much
on the fonts installed on the system used to do the test}
>>> get_fonts_used_for_text('DejaVu Sans Mono', '😀 ')
[('😀', {'requested': False, 'glyphcount': 1}), (' ', {'requested': True, 'glyphcount': 1})]
>>> get_fonts_used_for_text('DejaVu Sans', '日本語 नमस्ते')
[('日本語 ', {'requested': False, 'glyphcount': 4}), ('नमस्ते', {'requested': False, 'glyphcount': 5})]
>>> get_fonts_used_for_text('DejaVu Sans', '日本語 🕉️')
[('日本語 ', {'requested': False, 'glyphcount': 4}), ('🕉️', {'requested': False, 'glyphcount': 1})]
>>> get_fonts_used_for_text('Twemoji', '🙂↕️')
[('🙂\u200d↕️', {'requested': False, 'glyphcount': 1})]
>>> get_fonts_used_for_text('Twemoji', '🙂↕️', fallback=False)
[('🙂\u200d↕️', {'requested': True, 'glyphcount': 3})]
“Twemoji” has no glyph for the flag of Sark (added in Unicode 16.0) but “Noto Color Emoji” has it.
Even though “Twemoji” has no glyph for the flag of Sark, Pango renders the sequence of two
code points (U+1F1E8 U+1F1F) as one glyph when “Twemoji” is specified and fallback is not allowed
(Visually the glyph shown appears empty, there is no “Tofu”):
>>> get_fonts_used_for_text('Twemoji', '🇨🇶', fallback=False)
[('🇨🇶', {'requested': True, 'glyphcount': 1})]
>>> get_fonts_used_for_text('Twemoji', '🇨🇶')
[('🇨🇶', {'requested': False, 'glyphcount': 1})]
“Twemoji” does not have the glyph for this single code point emoji but “Noto Color Emoji” has it
(visually the glyph shown when Twemoji is used is a “Tofu” block with the code point inside):
>>> get_fonts_used_for_text('Twemoji', '', fallback=False)
[('\U0001fae9', {'requested': True, 'glyphcount': 1})]
>>> get_fonts_used_for_text('Twemoji', '', fallback=True)
[('\U0001fae9', {'requested': False, 'glyphcount': 1})]
Both “Twemoji” and “Noto Color Emoji” have the glyph for this single code point emoji,
both render it well when inspected visually:
>>> get_fonts_used_for_text('Twemoji', '🤥', fallback=False)
[('🤥', {'requested': True, 'glyphcount': 1})]
>>> get_fonts_used_for_text('Twemoji', '🤥', fallback=True)
[('🤥', {'requested': False, 'glyphcount': 1})]
'''
fonts_used = []
text_utf8 = text.encode('UTF-8', errors='replace')
label = Gtk.Label()
pango_context = label.get_pango_context()
pango_layout = Pango.Layout(pango_context)
pango_font_description = Pango.font_description_from_string(font)
pango_layout.set_font_description(pango_font_description)
pango_attr_list = Pango.AttrList()
pango_attr_fallback = Pango.attr_fallback_new(fallback)
pango_attr_list.insert(pango_attr_fallback)
pango_layout.set_attributes(pango_attr_list)
pango_layout.set_text(text)
pango_layout_line = pango_layout.get_line_readonly(0)
gs_list = pango_layout_line.runs
number_of_runs = len(gs_list)
for glyph_item in gs_list:
pango_item = glyph_item.item
offset = pango_item.offset
length = pango_item.length
_num_chars = pango_item.num_chars
pango_glyph_string = glyph_item.glyphs
num_glyphs = pango_glyph_string.num_glyphs
pango_analysis = pango_item.analysis
pango_font = pango_analysis.font
font_description_used = pango_font.describe()
run_text = text_utf8[offset:offset + length].decode('UTF-8', errors='replace')
run_family = font_description_used.get_family()
fonts_used.append((run_text, {'requested': run_family == font , 'glyphcount': num_glyphs}))
return fonts_used
def _init() -> None:
'''Initialization'''
return
def _del() -> None:
'''Cleanup'''
return
class __ModuleInitializer: # pylint: disable=too-few-public-methods,invalid-name
def __init__(self) -> None:
_init()
def __del__(self) -> None:
return
if __name__ == "__main__":
import doctest
(FAILED, _ATTEMPTED) = doctest.testmod()
sys.exit(FAILED)
(TBH I... just can't decide whether that change makes the tests less useful, or if it doesn't actually matter.)
@mike-fabian
One option to make the doctests universal/reproducible, at the cost of (admittedly) a good deal of detail, would be to only return a boolean value indicating whether the font used was the one requested, rather than its exact identity.
On my system, I hit some failures with the original code — as your own comments indicated would likely be the case — when the fallback font chosen didn't match what was selected on your system. But this version passes with flying colors, by eliminating the dependence on exact font identities: [...]
(TBH I... just can't decide whether that change makes the tests less useful, or if it doesn't actually matter.)
I am not sure, it still might fail depending on which fonts exactly are installed on the system where the test was done. A different version of Twemoji might be installed for example. I have enormous amounts of fonts installed on my personal system and therefore results for such font tests on my machine are typically already different then on a default installation of the same Fedora version I am using. Running this successfully on every distribution out there is probably hopeless.
Also I want the function to return which font name was really used to display that in the context menu in emoji-picker.
As in this screenshot where the requested font is "Symbola" but the font acctually used for U+1FAE2 face with open eyes and hand over mouth
is TH-Tshyn-P1
. And I want to know that...
In the meantime I have improved my code to detect whether a font seems to render an emoji sequence but the result is empty (like the flag of Sark in the Twemoji font), and whether a glyph for a single code point emoji is unavailable.
My new code is this:
from typing import List
from typing import Tuple
from typing import Dict
from typing import Any
import sys
from gi import require_version # type: ignore
require_version('Gtk', '3.0')
from gi.repository import Gtk # type: ignore
require_version('Pango', '1.0')
from gi.repository import Pango
def get_fonts_used_for_text(
font: str, text: str, fallback: bool = True) -> List[Tuple[str, Dict[str, Any]]]:
'''Return a list of fonts which were really used to render a text
:param font: The font requested to render the text in
:param text: The text to render
:param fallback: Whether to enable font fallback. If disabled, then
glyphs will only be used from the closest matching
font on the system. No fallback will be done to other
fonts on the system that might contain the glyphs needed
for the text.
Examples:
(Don’t run CI checks regularly on these examples, it depends too much
on the fonts installed on the system used to do the test}
>>> get_fonts_used_for_text('DejaVu Sans Mono', '😀 ')
[('😀', {'font': 'Noto Color Emoji', 'glyph-count': 1, 'visible': True, 'glyph-available': True}), (' ', {'font': 'DejaVu Sans Mono', 'glyph-count': 1, 'visible': False, 'glyph-available': True})]
>>> get_fonts_used_for_text('DejaVu Sans', '日本語 नमस्ते')
[('日本語 ', {'font': 'Droid Sans Fallback', 'glyph-count': 4, 'visible': True}), ('नमस्ते', {'font': 'FreeSans', 'glyph-count': 5, 'visible': True})]
>>> get_fonts_used_for_text('DejaVu Sans', '日本語 🕉️')
[('日本語 ', {'font': 'Droid Sans Fallback', 'glyph-count': 4, 'visible': True}), ('🕉', {'font': 'Noto Color Emoji', 'glyph-count': 1, 'visible': True, 'glyph-available': True})]
>>> get_fonts_used_for_text('DejaVu Sans', '🕉\uFE0F')
[('🕉', {'font': 'Noto Color Emoji', 'glyph-count': 1, 'visible': True, 'glyph-available': True})]
>>> get_fonts_used_for_text('DejaVu Sans', '')
[]
>>> get_fonts_used_for_text('DejaVu Sans', '\\n')
[]
>>> get_fonts_used_for_text('DejaVu Sans', '\u0008') # BACKSPACE
[('\\x08', {'font': 'DejaVu Sans', 'glyph-count': 1, 'visible': True, 'glyph-available': False})]
>>> get_fonts_used_for_text('DejaVu Sans', '\u001b') # ESCAPE
[('\\x1b', {'font': 'DejaVu Sans', 'glyph-count': 1, 'visible': True, 'glyph-available': False})]
>>> get_fonts_used_for_text('DejaVu Sans', ' ')
[(' ', {'font': 'DejaVu Sans', 'glyph-count': 1, 'visible': False, 'glyph-available': True})]
>>> get_fonts_used_for_text('', 'a')
[('a', {'font': 'DejaVu Sans', 'glyph-count': 1, 'visible': True, 'glyph-available': True})]
>>> get_fonts_used_for_text('Twemoji', '🙂↕️')
[('🙂\u200d↕️', {'font': 'Noto Color Emoji', 'glyph-count': 1, 'visible': True})]
>>> get_fonts_used_for_text('Twemoji', '🙂↕️', fallback=False)
[('🙂\u200d↕️', {'font': 'Twemoji', 'glyph-count': 3, 'visible': True})]
“Twemoji” has no glyph for the flag of Sark (added in Unicode 16.0) but “Noto Color Emoji” has it.
Even though “Twemoji” has no glyph for the flag of Sark, Pango renders the sequence of two
code points (U+1F1E8 U+1F1F) as one glyph when “Twemoji” is specified and fallback is not allowed
(Visually the glyph shown appears empty, there is no “Tofu”):
>>> get_fonts_used_for_text('Twemoji', '🇨🇶', fallback=False)
[('🇨🇶', {'font': 'Twemoji', 'glyph-count': 1, 'visible': False})]
>>> get_fonts_used_for_text('Twemoji', '🇨🇶')
[('🇨🇶', {'font': 'Noto Color Emoji', 'glyph-count': 1, 'visible': True})]
>>> get_fonts_used_for_text('Twemoji', '🏴', fallback=False)
[('🏴\U000e0067\U000e0062\U000e0077\U000e006c\U000e0073\U000e007f', {'font': 'Twemoji', 'glyph-count': 1, 'visible': True})]
>>> get_fonts_used_for_text('Twemoji', '🏴')
[('🏴\U000e0067\U000e0062\U000e0077\U000e006c\U000e0073\U000e007f', {'font': 'Noto Color Emoji', 'glyph-count': 1, 'visible': True})]
“Twemoji” does not have the glyph for this single code point emoji but “Noto Color Emoji” has it
(visually the glyph shown when Twemoji is used is a “Tofu” block with the code point inside):
>>> get_fonts_used_for_text('Twemoji', '', fallback=False)
[('\U0001fae9', {'font': 'Twemoji', 'glyph-count': 1, 'visible': True, 'glyph-available': False})]
>>> get_fonts_used_for_text('Twemoji', '', fallback=True)
[('\U0001fae9', {'font': 'Noto Color Emoji', 'glyph-count': 1, 'visible': True, 'glyph-available': True})]
Both “Twemoji” and “Noto Color Emoji” have the glyph for this single code point emoji,
both render it well when inspected visually:
>>> get_fonts_used_for_text('Twemoji', '🤥', fallback=False)
[('🤥', {'font': 'Twemoji', 'glyph-count': 1, 'visible': True, 'glyph-available': True})]
>>> get_fonts_used_for_text('Twemoji', '🤥', fallback=True)
[('🤥', {'font': 'Noto Color Emoji', 'glyph-count': 1, 'visible': True, 'glyph-available': True})]
'''
fonts_used = []
text_utf8 = text.encode('UTF-8', errors='replace')
label = Gtk.Label()
pango_context = label.get_pango_context()
pango_layout = Pango.Layout(pango_context)
pango_font_description = Pango.font_description_from_string(font)
pango_layout.set_font_description(pango_font_description)
pango_attr_list = Pango.AttrList()
pango_attr_fallback = Pango.attr_fallback_new(fallback)
pango_attr_list.insert(pango_attr_fallback)
pango_layout.set_attributes(pango_attr_list)
pango_layout.set_text(text)
pango_layout_line = pango_layout.get_line_readonly(0)
gs_list = pango_layout_line.runs
_number_of_runs = len(gs_list)
for glyph_item in gs_list:
pango_item = glyph_item.item
offset = pango_item.offset
length = pango_item.length
_num_chars = pango_item.num_chars
pango_glyph_string = glyph_item.glyphs
num_glyphs = pango_glyph_string.num_glyphs
pango_analysis = pango_item.analysis
pango_font = pango_analysis.font
font_description_used = pango_font.describe()
run_text = text_utf8[offset:offset + length].decode(
'UTF-8', errors='replace')
run_family = font_description_used.get_family()
pango_layout_run = Pango.Layout(pango_context)
pango_layout_run.set_font_description(pango_font_description)
pango_layout_run.set_attributes(pango_attr_list)
pango_layout_run.set_text(run_text)
pango_layout_run_line = pango_layout_run.get_line_readonly(0)
visible = False
ink_rect, logical_rect = pango_layout_run_line.get_pixel_extents()
if ink_rect.width > 0 and ink_rect.height > 0:
visible = True
results_for_run = {
'font': run_family,
'glyph-count': num_glyphs,
'visible': visible}
# If it is only one character followed by a variation
# selector, remove the variation selector before checking
# whether the Pango font has that character:
if len(run_text) == 2 and run_text[1] in ('\uFE0F', '︎\uFE0E'):
run_text = run_text[0]
if (num_glyphs == 1
and len(run_text) == 1
and hasattr(Pango.Font, 'has_char')):
results_for_run['glyph-available'] = pango_font.has_char(run_text)
fonts_used.append((run_text, results_for_run))
return fonts_used
def emoji_font_fallback_needed(font: str, text: str) -> bool:
'''
Examples:
Twemoji does not support the emoji sequence for “head shaking vertically”
(U+1F642 U+200D U+2195, added in Unicode 15.1):
>>> emoji_font_fallback_needed('Twemoji', '🙂↕️')
True
Twemoji does not have the flag of Sark (U+1F1E8 U+1F1F6, added in Unicode 16.0):
>>> emoji_font_fallback_needed('Twemoji', '🇨🇶')
True
Twemoji does not have U+1FAE9 FACE WITH BAGS UNDER EYES (added in Unicode 16.0):
>>> emoji_font_fallback_needed('Twemoji', '')
True
But Twemoji has U+1F925 LYING FACE (added in Unicode 9.0):
>>> emoji_font_fallback_needed('Twemoji', ' 🤥')
False
Twemoji does support the emoji sequence for the flag of Wales
(U+1F3F4 U+E0067 U+E0062 U+E0077 U+E006C U+E0073 U+E007F):
>>> emoji_font_fallback_needed('Twemoji', '🏴')
False
Twemoji does not have regular Latin characters like “A”:
>>> emoji_font_fallback_needed('Twemoji', 'A')
True
But of course any standard font has “A”:
>>> emoji_font_fallback_needed('Sans', 'A')
False
If the text given contains more than one emoji, then we don’t know and
the result is always True because a fallback might be needed:
>>> emoji_font_fallback_needed('Twemoji', '🤥')
True
>>> emoji_font_fallback_needed('Twemoji', '🏴🤥')
True
'''
fonts_used = get_fonts_used_for_text(font, text, fallback=False)
if len(fonts_used) > 1:
# If there is more than one run, that means the text contained more
# then just a single emoji or a single character. A fallback
# might be needed in that case, that is hard to tell. Just
# assume it is needed for the moment:
return True
results_for_run = fonts_used[0][1]
if results_for_run['glyph-count'] > 1:
return True
if not results_for_run['visible']:
return True
if 'glyph-available' in results_for_run and not results_for_run['glyph-available']:
return True
return False
def _init() -> None:
'''Initialization'''
return
def _del() -> None:
'''Cleanup'''
return
class __ModuleInitializer: # pylint: disable=too-few-public-methods,invalid-name
def __init__(self) -> None:
_init()
def __del__(self) -> None:
return
if __name__ == "__main__":
import doctest
(FAILED, _ATTEMPTED) = doctest.testmod()
sys.exit(FAILED)
Now I have glyph-count
, visible
, and glyph-available
and together these enable be to figure out whether a single emoji (which might be a sequence) is already supported by the requested font or not.
This new convenience function
def emoji_font_fallback_needed(font: str, text: str) -> bool:
seems to do correctly what we need.
If we want automatic test cases, it is probably better to add some in the tests/ subdirectory of the ibus-typing-booster source code.
There we can test only certain parts of the return values of functions if we want to and we can add conditionals depending on which fonts are installed or which distribution the test is run on.
The doctests in the itb_pango.py file above are probably more useful for documenting how that function works and how it can be used then for running automatic tests on a wide variety of systems.
@ferdnyc
As you are using Fedora, could you try one of the ibus-typing-booster-2.26.1 test builds from my copr repo please:
https://copr.fedorainfracloud.org/coprs/mfabian/ibus-typing-booster/builds/
There are builds for Fedora 39, Fedora 40, and Fedora 41.
You can install by enabling the repo and then using dnf to updade:
sudo dnf copr enable mfabian/ibus-typing-booster
sudo dnf update
With these builds, the font fallbacks in emoji-picker should finally work correctly again.
I uploaded ibus-typing-booster-2.26.2 test builds to the copr repo now.
Compared to 2.26.2 these fix some minor issues for “Symbola” and “Twitter Color Emoji”.
I also added some test cases in the tests/ subdirectory and cleaned up the function documentation in itb_pango.py.
“Twitter Color Emoji” is not the same as “Twemoji”, it is a a font with SVG images in an OpenType font. That means it can scale to any size, even very huge sizes without becoming blurry.
There is no package for Fedora but one can get it from https://github.com/13rac1/twemoji-color-font
The latest release is currently for Unicode 15.1.0:
Just download and unpack the tarball in ~/.fonts/
I tested that it works well on Fedora 40 and Fedora 41.
@ferdnyc
“Twitter Color Emoji” is not the same as “Twemoji”, it is a a font with SVG images in an OpenType font. That means it can scale to any size, even very huge sizes without becoming blurry.
Actually “Noto Color Emoji” is also available as an SVG in OpenType font. The Fedora package google-noto-color-emoji-fonts-20241008-1.fc41.noarch
contains the bitmap version, the SVG in OpenType font is available here:
https://github.com/googlefonts/noto-emoji/blob/main/fonts/Noto-COLRv1.ttf
I downloaded it and put it into my ~/.fonts/
directory and made this screenshot comparing the SVG in OpenType version (left side) with the bitmap version (right side):
@ferdnyc
Weird special case when using the “Blobemoji” font (https://github.com/C1710/blobmoji (Old “blob” style Google emoji, fork of Noto Color Emoji)):
This font doesn’t really have a flag for Sark but shows its own replacement flag. Therefore, the glyph-count
is 1
and visible
is True
and I cannot detect anymore that a fallback to a different font would be nice for the flag of Sark.
Compare this with the behaviour for “Twemoji” which does not have the flag of Sark either but renders en empty glyphs with zero ink extent which I can detect as visible
equal to False
:
Therefore, when fallback is enabled and the “Twemoji” font is used, a fallback is used for the flag of Sark:
Describe the bug When browsing in the
emoji-picker
window using a non-default emoji font, in most cases activating the "Fallback" checkbox will replace ALL emoji with the ones from the default emoji font, not only the missing glyphs.To Reproduce Steps to reproduce the behavior:
Expected behavior Only the missing glyphs not present in the Twemoji font will be filled in with glyphs from the default Noto Color Emoji font.
Screenshots or videos
'food' category in Twemoji with fallback off
'food' category in Twemoji with fallback enabled
'food' category in Noto Color Emoji with fallback off
emoji-picker version?
emoji-picker-2.25.3-1.fc39.noarch
from Fedora repoibus version? Not applicable, but
ibus-1.5.29-1.fc39.x86_64
from Fedora repoDistribution and version? Fedora 39
Desktop and version? GNOME Shell 45.5
Xorg or Wayland? Wayland
Additional context
For some reason, this doesn't happen in the "regional" category — and only in the "regional" category.
'regional' category in Twemoji with fallback off
'regional' category in Twemoji with fallback on
'regional' category in Noto Color Emoji (falback off)