Open JulienPalard opened 7 years ago
@jfbu could you give us comments for this please?
I can't determine it's better or not. I don't know which is good successor of PDFLatex, LuaTeX and XeTeX. And Also don't know they are enough stable or not for usage of Sphinx.
Of course, I will agree to change default latex engine if either one is enough stable.
I will make general remarks.
LuaLaTeX
is actively maintained and will probably offer more and more features via dedicated packages which achieve things currently impossible in TeX. But these advantages are probably not needed by vast majority of Sphinx projects. Besides, it appears that ̀LuaLaTeX
opens up new security concerns in TeX world, due to scripting language Lua not having the restrictions for file opening and writing which exist with pdfTeX binaries as distributed with major installations (TeXLive and MikTeX). It may be said that such concerns already exist from running Python scripts,... nevertheless this makes one think twice before adopting it on grand scale by default. Thus, I would not recommend using lualatex
by default, before experience has accumulated elsewhere.
XeLaTeX
does not have this issue.
but switching to it does not solve all Unicode related problems: in fact, in hand-written documents authors manually switch languages according to needed glyphs, and they set-up appropriate fonts for languages (at least indirectly via polyglossia package)
so far Sphinx LaTeX writer does not support multi-lingual documents. Even if it did, author of Sphinx project would need to manually add mark-up to source in case of exotic Unicode characters to signal the change of language, hence possible OpenType font to use: fonts do not support all scripts, although indeed some fonts do support a wide range of scripts.
it appears that polyglossia support for French lags behind babel+french features, so if we at Sphinx set usage of ̀xelatex+polyglossia
default, we may raise specific French issues -- admittedly they may be relevant only to expert LaTeX users which will know how to switch back to babel+french usage.
they are issues with xelatex
regarding math mode: it has some currently non-fixed bugs there, but this is arguably not a very strong deterrent for Sphinx projects.
Making xelatex default will modify looks of all Sphinx produced PDFs, because xelatex should be used with OpenType fonts. It can be used with traditional TeX fonts, but then hyphenation mechanism of TeX is broken in some languages. Recently the LaTeX team has modified behaviour of LaTeX so that by default if used with xelatex
engine it will use OpenType version of lmodern
font.
So making xelatex
default also requires reviewing font configuration and all Sphinx supported languages and as I said it will change the default looks of all Sphinx build PDF documentations.
This looks like quite some work at Sphinx side... I think first step is to move Sphinx towards supporting multi-lingual documents. Because making xelatex
default engine is not by itself a 100% solution to all problems related to Unicode input. It requires extra steps.
One last pros and cons:
typically xelatex produced PDFs are smaller than pdflatex produced ones, when using traditional TeX fonts, because xelatex better compresses the font; but as explained already, xelatex should not be used with traditional TeX fonts for optimal results,
compilation times with xelatex or lualatex are often significantly increased compared to pdflatex builds.
That's a lot to consider and I'm no latex expert. I just noticed that the current default (pdflatex/platex) put me in a hard situation when building english, french, and japanese:
So I just can't have a successful build with conf.py
, I have to use sphinx-build -D flags to pass the right latex_engine for the right language, with an external logic.
It took me some time to find the "right combination", which looks in fact really simple, just replace pdflatex with xelatex as a default engine but keep the "default to platex for japanese if default engine is used".
In one hand I may be short sighted as I tested a single project, in the other hand the Python documentation is huge (230k lines of rst).
+1 to xelatex
I can confidently say that most Chinese LaTeX users prefer xelatex to pdflatex nowadays, because xelatex has MUCH better support for opentype fonts, thus Chinese uses find it WAY MORE easier to display Chinese characters in the generated pdf. The same technology applies to Japanese and Korean characters too (we often refer their fonts together as CJKfonts).
@jfbu In my understanding, sphinx-doc maintains its default template of pdf, thus something like “front issue” should not be a problem (to users)?
@JulienPalard switching from pdflatex to xelatex for JP doc is not THAT trivial. At least you should set \setCJKmainfont , otherwise JP characters are not expected to be displayed correctly. Still, it’s kind of easy for simple cases, see https://tex.stackexchange.com/questions/139081/cjk-blank-output-for-japanese-characters
Some more helpful info here:
There is no notion of seamless experience in LaTeX regarding Unicode, although xelatex and lualatex have considerably improved the situation.
Already, Sphinx does the minimal right thing regarding xelatex which is not to use inputenc nor fontenc. With a recent LaTeX this means it will automatically use the Latin Modern OpenType font which has good coverage of European (in the large sense) languages.
$ otfinfo -s lmroman10-regular.otf
DFLT Default
cyrl Cyrillic
latn Latin
latn.AZE Latin/Azeri
latn.CRT Latin/Crimean Tatar
latn.MOL Latin/Moldavian
latn.NLD Latin/Dutch
latn.PLK Latin/Polish
latn.ROM Latin/Romanian
latn.TRK Latin/Turkish
It has no coverage for Chinese or Hebrew for example. This means Sphinx user for a project in these languages must customize LaTeX preamble to appropriately use \setmainfont
(or \setCJKmainfont
as documented by @fyears) to pick suitable font (Sphinx loads fontspec
which provides this macro; but xelatex has its own font loading primitives which advanced xelatex users use directly; normal users will use fontspec
and they will have had to read partly its documentation; does this include the average Sphinx-doc user?).
The way this is done is system dependent regarding fonts which are provided with TeX itself (and on Mac OS X one must use different methods depending on whether the OpenType font is a system/user font or in the TeX tree).
Even the minimal Sphinx set-up for xelatex contains elements which are not satisfactory: the coverage of French language by polyglossia is far more restricted than what the babel-frenchb module provides: with polyglossia there is no conformity regarding footnotes and lists with the French typographical rules.
Besides, latex-babel is now (after some years of stagnation) actively maintained and being developed in direction of xelatex/lualatex support. As a result it is not clear if polyglossia will remain preferable to babel in future.
Regarding French as I said it is not. Sphinx French user of xelatex is now well advised to modify latex_elements
'babel'
's key to set it to '\usepackage{babel}'
. Sphinx internally has 'polyglossia'
but will obey 'babel'
key if the user has set it:
# set up multilingual module...
# 'babel' key is public and user setting must be obeyed
if self.elements['babel']:
# this branch is not taken for xelatex/lualatex if default settings
Making xelatex
default makes no sense if reasonable font defaults for all Sphinx covered languages are not provided.
For example, similarly as we have specific coverage of japanese [1]_, we can provide specific coverage of Chinese if consensus emerges on how to best set-it up with XeLaTeX and this must be done Windows, Mac OS X, Unixen... Contributions are most welcome !
.. [1] which as mentioned already in this thread goes currently via platex
engine which does not support Unicode.
And, stressing again, this does not solve problems one may encounter with stray Unicode characters !
Here is basic test of Hebrew with xelatex
:
\documentclass[hebrew]{article}
\usepackage{polyglossia}
\setmainlanguage{hebrew}
\begin{document}
מבוא
\end{document}
Produces errors:
./testhebrew.tex:4: Package polyglossia Error: The current roman font does not
contain the Hebrew script!
(polyglossia) Please define \hebrewfont with \newfontfamily.
See the polyglossia package documentation for explanation.
Type H <return> for immediate help.
...
l.4 \begin{document}
(That was another \errmessage.)
Missing character: There is no מ in font [lmroman10-regular]:mapping=tex-text;!
Missing character: There is no ב in font [lmroman10-regular]:mapping=tex-text;!
Missing character: There is no ו in font [lmroman10-regular]:mapping=tex-text;!
Missing character: There is no א in font [lmroman10-regular]:mapping=tex-text;!
./testhebrew.tex:6: Package polyglossia Error: The current roman font does not
contain the Hebrew script!
(polyglossia) Please define \hebrewfont with \newfontfamily.
See the polyglossia package documentation for explanation.
Attempting to try Sphinx on minimal Hebrew document with xelatex leads to plenty of problems:
.. FOO documentation master file, created by
sphinx-quickstart on Sat Oct 21 14:57:01 2017.
You can adapt this file completely to your liking, but it should at least
contain the root `toctree` directive.
תוכן הענייני
============
רשימת הטבלאות
in conf.py:
language = 'he'
latex_engine = 'xelatex'
Package bidi Error: Oops! you have loaded package xcolor after bidi package. Please load package xcolor before bidi package, and then try to run xelatex on your document again.
Package bidi Error: Oops! you have loaded package float after bidi package. Please load package float before bidi package, and then try to run xelatex on your document again.
Package bidi Error: Oops! you have loaded package framed after bidi package. Please load package framed before bidi package, and then try to run xelatex on your document again.
Package bidi Error: Oops! you have loaded package wrapfig after bidi package. Please load package wrapfig before bidi package, and then try to run xelatex on your document again.
etc... etc...
and the one of interest to this thread:
Package polyglossia Error: The current roman font does not contain the Hebrew script!
...
(as above)
This confirms Sphinx-doc user will have to know a minimum of LaTeX macros (\newfontfamily
) and documentation (fontspec
, polyglossia
) before reaching usable status for Hebrew language documents even with xelatex
as latex_engine
.
(we at Sphinx should probably take care of loading polyglossia
hence bidi
at the right place)
jfbu, Thank you for comment.
As you said, moving to xelatex is not silver bullet. AFAIK, there are no common settings that works well for all languages.
@fyears For Chinese docs, #3272 is proposed. It tries to move to xelatex and ctex only if language is zh_*
.
Note:
This looks like quite some work at Sphinx side... I think first step is to move Sphinx towards supporting multi-lingual documents.
I don't know this is really needed. I've never seen such request. So it's okay to support only one language per project at once.
(edit) oh, I understand #4159 requires it...
@tk0miya in the case of CPython docs (which is big...), for example French translation is only at 27.2%
currently.
It could make sense (not only for PDF perhaps, but for PDF it is important due to hyphenation which depends on language) to have multi-lingual. Currently only portions of CPython's library.pdf
(about 1800 pages) are in French but the whole is treated as French document. This means that hyphenation is wrong for all English text, which is vast majority of document.
(I am using make latex SPHINXOPTS="-D locale_dirs=locales -D language='fr' -D gettext_compact=0"
to build the CPython French documentation, with Doc/locales/fr/LC_MESSAGES
a symlink to the python-docs-fr
cloned repo at 3.6 branch)
Ah, I understand. Surely, it is mixture of English and French. I feel it is very difficult to support it in Sphinx. We must mark languages per sentences or words...
@tk0miya But this is done by Docutils already. Consider this test file
Welcome to FOO's documentation!
===============================
Hello
.. class:: language-fr
Bonjour
.. class:: language-de
Guten Tag
Again English.
and then rst2latex.py index.rst test.tex
constructs a LaTeX file which looks like this (non relevant lines cut):
\documentclass[a4paper]{article}
% generated by Docutils <http://docutils.sourceforge.net/>
\usepackage{cmap} % fix search and cut-and-paste in Acrobat
\usepackage{ifthen}
\usepackage[T1]{fontenc}
\usepackage[utf8]{inputenc}
\usepackage[french,ngerman,english]{babel}
% Prevent side-effects if French hyphenation patterns are not loaded:
\frenchbsetup{StandardLayout}
\AtBeginDocument{\selectlanguage{english}\noextrasfrench}
[lines cut]
\begin{document}
\maketitle
Hello
\foreignlanguage{french}{Bonjour}
\foreignlanguage{ngerman}{Guten Tag}
Again English.
\end{document}
On further experiment in case of multiple paragraphs each one is given as argument to \foreignlanguage
. It should be probably better with \begin{otherlanguage}{french}...\end{otherlanguage}
mark-up.
On the other hand Sphinx make latex
produces this kind of output:
Hello
\begin{fulllineitems}
\pysigline{\sphinxbfcode{language-fr}}
Bonjour
Un autre paragraphe
\end{fulllineitems}
\begin{fulllineitems}
\pysigline{\sphinxbfcode{language-de}}
Guten Tag
\end{fulllineitems}
Again English.
Possibly related to #4010
HTML output from rst2html.py
looks like this:
<p>Hello</p>
<p lang="fr">Bonjour</p>
<p lang="fr">Un autre paragraphe</p>
<p lang="de">Guten Tag</p>
<p>Again English.</p>
@jfbu: In Sphinx, the .. class::
directive has a different meaning. Use .. rst-class:: language-XY
if you want to insert the original Docutils directive. It should work then.
@mitya57: thanks for the tip, which does work indeed for html target, producing same lang
attributes as rst2html.py
. But it fails for latex target (as expected from actual writers/latex.py
code...); the fulllineitems
environments are gone however, the output simply losing all traces of the language tags in reST sources.
Sphinx 2.0 will use GNU FreeFont with xelatex
, providing good coverage of Latin, Cyrillic and Greek scripts (as well as Arabic and Hebrew). This adds new requirement fonts-freefont-otf on Ubuntu xenial or e.g. in Fedora 29 texlive-gnu-freefont. Perhaps Sphinx 3.0 can then have 'xelatex'
as default latex_engine
, for non-Japanese projects.
(edit: and make suitable choice of fonts for Chinese with 'xelatex'
)
french and japanese - @ JulienPalard at https://github.com/sphinx-doc/sphinx/issues/4159#issue-266095659
i use hindi, and same problem will be faced with using any indian language script (Hindi, Nepali, Tamil, Telugu, Pubjabi, Marathi, Gujarati, ...).
compilation times with xelatex or lualatex are often significantly increased compared to pdflatex builds. - @ jfbu at https://github.com/sphinx-doc/sphinx/issues/4159#issuecomment-337268445
that's 'cz xelatex outputs in pdf, and modifying pdf is what takes time. to save on that, latexmk
uses xelatex
to fastly generate output of intermediate passes in .xdv
files; then converts that via xdvipdfmx
to .pdf
only once at last.
Ref (abridged by me, original at: latexmk-pdf):
-pdfxe Generate pdf version of document using xelatex [and xdvipdfmx via .xdv intermediate files]. Note that production of a .xdv file by xelatex is fast, [but of] a .pdf file can be quite time consuming when document includes large graphics files. So [this approach] can result in substantial gains in procesing time, since the .pdf file is produced once rather than on every run of xelatex.
@goyalyashpal
There is in our docs this tip:
Also, if latexmk is at version 4.52b or higher (January 2017) LATEXMKOPTS="-xelatex" speeds up PDF builds via XeLateX in case of numerous graphics inclusions.
This -xelatex
option is (with current Latexmk) equivalent to -pdfxe -dvi- -ps-
.
It is probably time in 2017 we do this unconditionally.
Subject: I built the cpython documentation in french and japanese, and found it non-trivial to find the right set of options.
Problem
Given that:
Є
in https://docs.python.org/3.7/whatsnew/3.7.html#optimizations)We could expect to find non-ascii characters everywhere, which are badly supported by pdflatex, even by using utf8x which come with another set of issues.
Proposed solution
I finally found that xelatex handle very well unicode characters, but does not work well with japanese. And platex works well with japanese.
platex is already the default with the latex_engine is not explicitly configured, which is already nice, but there is no way to configure xelatex for all languages and platex for japanese (https://github.com/sphinx-doc/sphinx/issues/4150).
It forces everyone to learn a lot about latex and PDF generation, and finally force them to use -D with external logic to switch between working engines like https://github.com/python/docsbuild-scripts/pull/34/files.
Also, the documentation is not very explicit about the usages of those engines (see https://github.com/sphinx-doc/sphinx/issues/4149).
What I propose is to switch the default from
'pdflatex' if language != 'ja' else 'platex'
to'xelatex' if language != 'ja' else 'platex'
which is a combination that works without any other modification to build cpython documentation in english, french, and japanese.