sphinx-doc / sphinx

The Sphinx documentation generator
https://www.sphinx-doc.org/
Other
6.2k stars 2.03k forks source link

Have emoji working in sphinx latexpdf ``xelatex`` generated in titles and code #12195

Open LuisBL opened 3 months ago

LuisBL commented 3 months ago

Describe the bug

with latex_elements = {"preamble": r"\setmainfont{Symbola}"} we have beautiful B&W emoji in xelatex generated PDF content but not on titles and not on code.

How to Reproduce

Below a sphinx to illustrate emoji in latexpdf:

$ sphinx-quickstart --no-batchfile --no-sep -p test_emoj -l en -a me -v 1.0 -q
$ cd test_emoj; tree
├── _build
├── conf.py
├── index.rst
├── Makefile
├── NotoColorEmoji.ttf
├── _static
└── _templates

add some content:

$ vim index.rst
$ cat index.rst 
=========
test_emoj
=========

Hello space 🚀
==============

Space 🚀 and hammer 🔨.

.. code::

  ✔ app 8 layers [⣿⣿⣿]  Pulled
$

Set latex_engine and article xelatex:

echo "latex_engine ='xelatex'"  >> conf.py
echo "latex_theme='howto'"  >> conf.py

HTML emoji are ok

$ make html 
...
$ 

HTML emoji are ok

latexpdf with xelatex engine have no emoji:

$ make latexpdf 
...
$ evince _build/latex/test_emoj.pdf 

no emoji

with setmainfont get get some emoji (B&W and only in content):

$ echo 'latex_elements = {"preamble": r"\setmainfont{Symbola}"}' >> conf.py
$ make latexpdf 
...
$ evince _build/latex/test_emoj.pdf 

with Symbola B&W emoji but not on title and code

The corresponding latex code expose that font for section does not know about emoji, neither font define for sphinxVerbatim::

$ vim _build/latex/test_emoj.tex
...
\section{Hello space 🚀}
\label{\detokenize{index:hello-space}}
\sphinxAtStartPar
Space 🚀, hammer 🔨 and ⣿

\begin{sphinxVerbatim}[commandchars=\\\{\}]
\(\pmb{\checkmark}\) app 8 layers [⣿⣿⣿]  Pulled
\end{sphinxVerbatim}
...

The latex log file expose that in code:: latex look for emoji in FreeMono.otf and in title latex look for emoji in FreeSansBold.otf:

$ grep '^Missi' _build/latex/test_emoj.log 
Missing character: There is no 🚀 (U+1F680) in font [FreeSansBold.otf]/OT:script
Missing character: There is no ⣿ (U+28FF) in font [FreeMono.otf]/OT:script=latn
Missing character: There is no ⣿ (U+28FF) in font [FreeMono.otf]/OT:script=latn
Missing character: There is no ⣿ (U+28FF) in font [FreeMono.otf]/OT:script=latn
$

To see which one of my fonts has the .

$ java -jar /usr/share/texlive/texmf-dist/scripts/albatross/albatross.jar --detailed ⣿

On my unbuntu 22.04 the below fonts have the ⣿ character:

/usr/share/fonts/truetype/ancient-scripts/Symbola_hint.ttf
/usr/share/fonts/truetype/dejavu/DejaVuSans-Bold.ttf
/usr/share/fonts/truetype/dejavu/DejaVuSans-BoldOblique.ttf
/usr/share/fonts/truetype/dejavu/DejaVuSans-Oblique.ttf
/usr/share/fonts/truetype/dejavu/DejaVuSans.ttf
/usr/share/fonts/truetype/dejavu/DejaVuSansCondensed-Bold.ttf
/usr/share/fonts/truetype/dejavu/DejaVuSansCondensed.ttf
/usr/share/fonts/truetype/dejavu/DejaVuSerif
/usr/share/fonts/truetype/noto/NotoSansSymbols2-Regular.ttf

Beyond the

Have emoji working in sphinx latexpdf xelatex generated in titles and code

question, I wonder why \setmainfont can have both Symbola and FreeSerif values in tex file, I expect to have only one mainfont, idealy to define Symbola as a fall back font in case characters are not found in the FreeSerif "mainfont":

    $ grep setmainfont _build/latex/test_emoj.tex 
    \setmainfont{FreeSerif}[
    \setmainfont{Symbola}
    $

Environment Information

$ cat /etc/issue
Ubuntu 22.04.4 LTS \n \l
$

$ sphinx-build --bug-report
Please paste all output below into the bug report template

Platform:              linux; (Linux-6.5.0-26-generic-x86_64-with-glibc2.35)
Python version:        3.11.8 (main, Feb 25 2024, 21:35:23) [GCC 11.4.0])
Python implementation: CPython
Sphinx version:        7.2.6
Docutils version:      0.20.1
Jinja2 version:        3.1.3
Pygments version:      2.17.2
$

Sphinx extensions

extensions = []

Additional context

No response

LuisBL commented 3 months ago

I posted on SO the corresponding question: https://stackoverflow.com/questions/78214882/have-emoji-working-in-sphinx-latexpdf-xelatex-generated-content-titles-and-code

picnixz commented 3 months ago

Just wondering, but does a plain LaTeX file with such emojis would do what you want? (like, what would the minimum LaTeX file which would work (independently of Sphinx))

LuisBL commented 3 months ago

Just wondering, but does a plain LaTeX file with such emojis would do what you want? (like, what would the minimum LaTeX file which would work (independently of Sphinx))

I'm not Latex good enough, the minimal whould be to have B&W emoji in both \section and \begin{Verbatim} I guess.

A cascading mechanism is probably the droid we are looking for ;) Like if you don't find the emoji in FreeSansBold.otf try to take it from Symbola

n-peugnet commented 3 months ago

Our solution for (color) emoji in the PDF output was to use LuaLaTeX and to define TwemojiMozilla as the fallback font.

You can see the result there: https://club1.fr/docs/en/club1-en-latest.pdf

The idea mainly came from this article: https://www.overleaf.com/learn/latex/Articles/An_overview_of_technologies_supporting_the_use_of_colour_emoji_fonts_in_LaTeX

jfbu commented 2 months ago

Maybe the method of https://github.com/numpy/numpy/pull/23172 would work here. Something such as

(check the file extension for Symbola, assuming it is otf here)

latex_elements['preamble'] = r"""
    \newfontfamily\FontForEmojis{Symbola}[Extension=.otf]
    \catcode`🚀\active\protected\def🚀{{\FontForEmojis\string🚀}}
    \catcode`🔨\active\protected\def🔨{{\FontForEmojis\string🔨}}
"""

You have to do it for all used characters. A latex code doing a loop is possible.

I am not fontspec-able. So no idea about your question with \setmainfont. All I know is that LaTeX maintainers are philosophically against LaTeX having a "cascading" fall-back scheme to finding characters in a font. Also because 30 years ago it would have been prohibitively costly. One has to go through methods like the above. edit: or use LuaLaTeX as explained by @n-peugnet

Edit: in the above perhaps NotoColorEmoji rather and ttf for the extension.

jfbu commented 2 months ago

Sorry for noise here. On reading more closely the very instructive overleaf link provided by @n-peugnet, especially XeTeX and OpenType color fonts, I realized that the method I indicated would not work with xelatex and NotoColorEmoji. It does work for lualatex as the following pure latex code snippet examplifies, but then the "fallback" mechanism of @n-peugnet comment is much better.

\documentclass{article}
% compile with lualatex on a recent TeXLive based installation (at least 2020)
\usepackage{fontspec}
\newfontfamily\FontForEmojis{NotoColorEmoji}[Renderer=Harfbuzz, Extension=.ttf]
    \catcode`🚀\active\protected\def🚀{{\FontForEmojis\string🚀}}
    \catcode`🔨\active\protected\def🔨{{\FontForEmojis\string🔨}}
\begin{document}
Test 🚀🔨
\end{document}
LuisBL commented 2 months ago

To keep this issue related to xelatex, I created a new one focus on LuaLaTeX: https://github.com/sphinx-doc/sphinx/issues/12332, with all insight got from @n-peugnet and @jfbu.

jfbu commented 2 months ago

B&W success using this set-up:

latex_engine = 'xelatex'
latex_theme = 'howto'

latex_elements = {
    "preamble": r"""
\setmainfont{Latin Modern Roman}[SmallCapsFont={* Caps}]
\setsansfont{Latin Modern Sans}
\setmonofont{DejaVu Sans Mono}[Scale=0.8]

\newfontfamily\FontOne{Symbola}
\newfontfamily\FontTwo{DejaVuSans}[Extension=.ttf]

    \catcode`🚀\active\protected\def🚀{{\FontOne\string🚀}}
    \catcode`🔨\active\protected\def🔨{{\FontOne\string🔨}}
    \catcode`⣿\active\protected\def⣿{{\FontTwo\string⣿}}
    """,
}

output: Capture d’écran 2024-04-28 à 15 20 05

jfbu commented 2 months ago

I changed the type of this ticket to question because it does not seem to be related to a bug in Sphinx LaTeX support, except if one considers support for color emojis should be built-in, on which topic I don't have myself any strong opinion!