sphinx-doc / sphinx

The Sphinx documentation generator
https://www.sphinx-doc.org/
Other
6.59k stars 2.12k forks source link

pdflatex index creation fails for index entries in French #13130

Open skuskusku opened 1 day ago

skuskusku commented 1 day ago

Describe the bug

My project creates a pdf file in french language (project language is French). If I add entries to the index like so:

.. index::
   single: Écran de connexion

then index creation fails for the latex build with something like this if I run latexmk -pdf on the generated .tex file and just hangs:

[1]
[2]
Chapitre 1.
(test.ind
[3]
[4]
! Argument of \UTFviii@two@octets@combine has an extra }.
<inserted text>
                \par
l.13   \bigletter �

?

If I do not use latexmk -pdf but instead build with TexnicCenter and pdflatex, then all entries in the index at the end of the resulting pdf are prepended with "\spxentry", so for the above example it looks like this:

\spxentryÉcran de connexion

How to Reproduce

My conf.py looks like this:

project = 'test'
copyright = '2024, me myself and I'
author = 'me myself and I'
release = '0.1'

# -- General configuration ---------------------------------------------------
# https://www.sphinx-doc.org/en/master/usage/configuration.html#general-configuration

extensions = []

templates_path = ['_templates']
exclude_patterns = []

language = 'fr'

# -- Options for HTML output -------------------------------------------------
# https://www.sphinx-doc.org/en/master/usage/configuration.html#options-for-html-output

html_theme = 'alabaster'
html_static_path = ['_static']

my index.rst file looks like this:

.. test documentation master file, created by
   sphinx-quickstart on Wed Nov 13 16:23:59 2024.
   You can adapt this file completely to your liking, but it should at least
   contain the root `toctree` directive.

test documentation
==================

.. index::
   single: Connexions multiples au serveur
   single: Client;Connexions multiples au serveur

Add your content using ``reStructuredText`` syntax. See the
`reStructuredText <https://www.sphinx-doc.org/en/master/usage/restructuredtext/index.html>`_
documentation for details.

another test documentation
==========================

************************************************
Exécution du client sur l'écran de connexion
************************************************

.. index::
   single: Écran de connexion

This index entry makes a helluva problems

.. toctree::
   :maxdepth: 2
   :caption: Contents:

Other than that I just ran sphinx-quickstart to reproduce this behaviour.

Environment Information

I am running Python 3.12.3 on Windows with miktex 23.4 and sphinx-build 8.1.3.

Sphinx extensions

I do not use any extensions, I could reproduce this with a modified variant of the sphinx-quickstart result.

Additional context

No response

AA-Turner commented 1 day ago

cc @jfbu

jfbu commented 23 hours ago

Either add latex_use_xindy = True or latex_engine = 'xelatex' or latex_engine = 'lualatex' to your conf.py.

Capture d’écran 2024-11-14 à 11 01 48

You should not use latexmk directly but use rather the Makefile or make.bat. (on macos I launch make latexpdf or sphinx-build -M latexpdf which takes care of the right things to do with the .tex file after make latex completes). This matters particularly for Xindy receiving the right options [on Windows systems, Makefile in build/latex is antiquated and on first sight completely not updated for using Xindy for indexing, use make.bat rather].

By the way comparing (on macos) the produced Makefile and make.bat in latex build directory I notice something fishy about the make.bat there is a line

set XINDYOPTS=%XINDYOPTS% -I xelatex

which has no equivalent in the Makefile when produced with latex_engine left to 'pdflatex'.

I have no access to WIndows system.

We have a texinputs_win/Makefile.jinja which is completely antiquated. Regarding the texinputs/make.bat.jinja it is the origin of line above and probably it is a bug on our part. The texinputs/make.bat.jinja should be a perfect translation of texinputs/Makefile.jinja but it is not. The latter may have been udpated at some point but not the former.

@AA-Turner I should open a separate issue (this one seems to be purely one of documentation there which should insist on breakage with makeindex unable to handle non-ascii) but I have no access to Windows whatsoever so I am not the best guy for this.

jfbu commented 23 hours ago

Our docs referred to in previous comments says

Use Xindy to prepare the index of general terms. By default, the LaTeX builder uses makeindex for preparing the index of general terms . Using Xindy means that words with UTF-8 characters will be ordered correctly for the language.

  • This option is ignored if latex_engine is 'platex' (Japanese documents; mendex replaces makeindex then).

  • The default is True for 'xelatex' or 'lualatex' as makeindex creates .ind files containing invalid bytes for the UTF-8 encoding if any indexed term starts with a non-ASCII character. With 'lualatex' this then breaks the PDF build.

  • The default is False for 'pdflatex', but True is recommended for non-English documents as soon as some indexed terms use non-ASCII characters from the language script.

The last item should warn that the breakage if an indexed term starts with a non-ASCII character is observed not only with lualatex but also with pdflatex. I am not available at these times, but can take care of this docs update in a few weeks.

skuskusku commented 20 hours ago

Thank you very much everyone. Using Xindy and the make.bat from the build directory solved my problems completely. I didn't even know I can specify lualatex as the preferred latex engine (which is what I wanted to do anyway). Thanks everyone, please close this issue.