sphinx-doc / sphinx

The Sphinx documentation generator
https://www.sphinx-doc.org/
Other
6.52k stars 2.12k forks source link

[RFC] LaTeX: for Russian and XeLaTeX/LuaLaTeX should polyglossia be replaced by babel? #5254

Open jfbu opened 6 years ago

jfbu commented 6 years ago

I have realized that polyglossia + russian creates very slow PDF builds. However I have not tested it on pure Russian project, hence I am asking for advice.

Here is my testing:

Now we are going to build Sphinx's own English documentation in PDF pretending the document is in Russian and forcing usage of xelatex (this will use polyglossia):

time make clean latexpdf O="-D latex_engine=xelatex -D latex_elements.fontpkg=\\\\setmainfont{CMU\ Serif}\\\\setsansfont{CMU\ Sans\ Serif}\\\\setmonofont{CMU\ Typewriter\ Text} -D latex_elements.preamble= -D language=ru "

Then repeat but forcing usage of babel in place of polyglossia:

time make clean latexpdf O="-D latex_engine=xelatex -D latex_elements.fontpkg=\\\\setmainfont{CMU\ Serif}\\\\setsansfont{CMU\ Sans\ Serif}\\\\setmonofont{CMU\ Typewriter\ Text} -D latex_elements.preamble= -D language=ru -D latex_elements.babel=\\\\usepackage{babel}"

The former gives me a whopping

polyglossia

real    4m32.324s
user    3m47.644s
sys 0m49.290s

and the latter a more reasonable

babel

real    0m42.242s
user    0m43.583s
sys 0m1.525s

I have done the first try also with lualatex with same result. So something in polyglossia + russian slows down considerably PDF build. But this might be caused by the text being in English.

Any advice welcome: we can make (like for French, for other reasons) babel the default for Russian in replacement of polyglossia + russian.

jfbu commented 6 years ago

Incidentally, the Sphinx doc contains Unicode character ⊞ but the CMU Typewriter Text font does not provide it:

Missing character: There is no ⊞ in font CMU Typewriter Text Regular/OT:script=
latn;language=DFLT;!

Thus the more correct command line invocation (moving to xelatex context a conf.py setting done for handling ⊞ with pdflatex) is:

time make clean latexpdf O="-D latex_engine=xelatex -D latex_elements.fontpkg=\\\\setmainfont{CMU\ Serif}\\\\setsansfont{CMU\ Sans\ Serif}\\\\setmonofont{CMU\ Typewriter\ Text} -D latex_elements.preamble=\\\\catcode\\\`^^^^229e\\\\active\\\\def^^^^229e{\\\\ensuremath{\\\\boxplus}} -D language=ru -D latex_elements.babel=\\\\usepackage{babel}"
mitya57 commented 6 years ago

Tested on a large (423 pages), russian language project:

XeTeX + Babel:

$ time make clean latexpdf O="-D latex_engine=xelatex -D latex_elements.fontpkg=\\\\setmainfont{CMU\ Serif}\\\\setsansfont{CMU\ Sans\ Serif}\\\\setmonofont{CMU\ Typewriter\ Text} -D latex_elements.babel=\\\\usepackage{babel}"
…
real    1m56.941s
user    2m2.426s
sys 0m2.844s

XeTeX + Polyglossia:

$ time make clean latexpdf O="-D latex_engine=xelatex -D latex_elements.fontpkg=\\\\setmainfont{CMU\ Serif}\\\\setsansfont{CMU\ Sans\ Serif}\\\\setmonofont{CMU\ Typewriter\ Text}"
…
real    2m13.890s
user    2m44.510s
sys 0m8.503s

LaTeX (we are using it by default):

$ time make clean latexpdf
…
real    1m46.307s
user    1m44.502s
sys 0m1.774s

I ran each command two times, the time was almost the same on the second run.

Versions are:

jfbu commented 6 years ago

@mitya57 thanks a lot for devoting time to this. It seems to mean that my observation is tied to a preponderantly English document hence is basically a false alarm.

There is still a bit speed advantage to XeLaTeX + Babel in case of your russian language project but this by itself may be not enough to justify making it default choice.

(keeping this open in case further advices are added; I will be offline for next few weeks)

tk0miya commented 6 years ago

+0; we've already provide default settings for French and xelatex. So I'm okay to add new configuration.