sphinx-doc / sphinx

The Sphinx documentation generator
https://www.sphinx-doc.org/
Other
6.62k stars 2.13k forks source link

Problems with footnotes in LaTeX #10175

Closed Jellby closed 2 years ago

Jellby commented 2 years ago

Describe the bug

These are a couple of issues with LaTeX footnotes, which may be related to #9529 and possibly partially fixed by #10169.

  1. Footnotes in tables in different files (and maybe other cases) are not distinguished and the final LaTeX document ends up with multiply defined labels and wrong links.
  2. The current page for a footnote mark is not correctly detected, and although the link is correct, the mark itself is not.

How to Reproduce

Default new project, with latex_elements = {'fontpkg': r'\usepackage[notextcomp]{kpfonts}'} and these two files:

one.rst

One
===

========== ===
Test [#a]_
========== ===
   1       One
========== ===

.. [#a] Footnote one.

two.rst

Two
===

========== ===
Test [#a]_
========== ===
   2       Two
========== ===

.. [#a] Footnote two.

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nullam ac nunc orci. Nulla ut dignissim augue. Etiam bibendum ex arcu, non ornare urna aliquam tincidunt. Quisque vehicula mattis nibh eget vestibulum. Etiam gravida, sapien eget fermentum porttitor, magna eros consectetur dui, ut accumsan mi tellus vel nisi. Vestibulum ante ipsum primis in faucibus orci luctus et ultrices posuere cubilia curae; Phasellus at diam magna. Vestibulum massa lacus, hendrerit eget ex nec, facilisis rutrum purus. Maecenas non lectus massa. Nulla scelerisque ex eu lorem egestas ultricies quis auctor lorem. Maecenas varius, metus quis dictum eleifend, sapien odio tincidunt lacus, et varius purus ante id arcu. Donec consequat ante sed mauris egestas efficitur. Maecenas eget mi lorem. Proin imperdiet, magna a egestas ultricies, orci enim congue sem, ut lobortis ipsum mauris sed est. Suspendisse potenti. Cras eget velit imperdiet, sollicitudin risus a, venenatis mi.

Nulla sed vulputate mi. Sed ut libero faucibus, viverra enim vel, pellentesque lacus. In lectus arcu, fermentum nec tempus eget, tincidunt euismod mi. Aliquam egestas dui pulvinar, sollicitudin ex imperdiet, varius tortor. Mauris tempus orci et finibus ultricies. In convallis tortor id mauris faucibus dapibus. Mauris molestie, lacus a venenatis consequat, nunc ante dapibus nisl, sed luctus lorem mi a velit. Vivamus vel suscipit ante. Nam ornare molestie efficitur. Nulla sodales nulla a purus faucibus, vel viverra tellus fringilla.

Duis cursus, lorem at pulvinar ornare, mi dui condimentum sem, nec convallis nisl augue id tortor. Nullam rhoncus ullamcorper mauris nec hendrerit. Duis tristique urna elit, at consectetur erat varius a. Integer accumsan a massa id accumsan. Aenean fringilla ullamcorper lorem ut rutrum. Suspendisse cursus odio nec turpis placerat eleifend. Nullam et tortor metus. Cras lacinia arcu accumsan, egestas neque vel, mattis sem. Duis ac nisl eget nulla bibendum pretium nec vel sem. Donec nec sollicitudin urna, non tincidunt ante. Class aptent taciti sociosqu ad litora torquent per conubia nostra, per inceptos himenaeos. Mauris sit amet dui nec lectus ullamcorper vestibulum in eu lorem. Proin fermentum sollicitudin rhoncus.

Fusce pulvinar lorem libero, vestibulum venenatis mauris elementum nec. Morbi in eros dui. Vestibulum aliquam felis lorem, vel rutrum magna dapibus eu. Suspendisse fermentum tempus condimentum. Morbi eu ex congue, blandit magna sed, sollicitudin sapien. Lorem ipsum dolor sit amet, consectetur adipiscing elit. In hac habitasse platea dictumst. In vel congue magna. Mauris condimentum sit amet mi quis laoreet. Aliquam ultrices suscipit lectus, at dignissim nibh finibus in.

Nullam pharetra scelerisque accumsan. Suspendisse rhoncus pulvinar eros, non pulvinar ante. Nullam auctor lacus a quam placerat, sed hendrerit felis consequat. In porttitor ut justo vulputate ultrices. Nulla et nunc iaculis diam congue luctus. Cras cursus elit hendrerit nunc facilisis, sit amet faucibus metus egestas. Cras malesuada in urna quis malesuada. In facilisis ac diam eu suscipit. Nam ac massa quis elit ullamcorper dignissim. Praesent tellus metus, consectetur vitae nisl in, hendrerit blandit purus. Mauris sollicitudin tortor ut nisl dapibus facilisis. Suspendisse potenti. Praesent scelerisque mattis blandit. Sed varius feugiat est id pulvinar. Aenean scelerisque porta neque id sollicitudin. Sed condimentum purus quam, ac tristique elit molestie ut. [#a]_

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nullam ac nunc orci. Nulla ut dignissim augue. Etiam bibendum ex arcu, non ornare urna aliquam tincidunt. Quisque vehicula mattis nibh eget vestibulum. Etiam gravida, sapien eget fermentum porttitor, magna eros consectetur dui, ut accumsan mi tellus vel nisi. Vestibulum ante ipsum primis in faucibus orci luctus et ultrices posuere cubilia curae; Phasellus at diam magna. Vestibulum massa lacus, hendrerit eget ex nec, facilisis rutrum purus. Maecenas non lectus massa. Nulla scelerisque ex eu lorem egestas ultricies quis auctor lorem. Maecenas varius, metus quis dictum eleifend, sapien odio tincidunt lacus, et varius purus ante id arcu. Donec consequat ante sed mauris egestas efficitur. Maecenas eget mi lorem. Proin imperdiet, magna a egestas ultricies, orci enim congue sem, ut lobortis ipsum mauris sed est. Suspendisse potenti. Cras eget velit imperdiet, sollicitudin risus a, venenatis mi. [#a]_

Expected behavior

Your project

See "How to Reproduce" above

Screenshots

No response

OS

Linux Ubuntu 20.04

Python version

3.6.9

Sphinx version

4.4.0

Sphinx extensions

No response

Extra tools

No response

Additional context

No response

jfbu commented 2 years ago

Thanks for report

Also singlehtml target reveals some problems with your example. second footnote mark hyperlinks to first target.

With latexpdf (and using PR #10169) the mark of the first footnote refers to footnote page 3. (at least the mark syncs with the hypertarget location).

So both targets have problems, but of opposite nature...

jfbu commented 2 years ago

This patch appears to fix it:

$ git diff
diff --git a/sphinx/writers/latex.py b/sphinx/writers/latex.py
index a2905e807..84986af71 100644
--- a/sphinx/writers/latex.py
+++ b/sphinx/writers/latex.py
@@ -653,6 +653,8 @@ class LaTeXTranslator(SphinxTranslator):
                 short = ''
                 if any(node.findall(nodes.image)):
                     short = ('[%s]' % self.escape(' '.join(clean_astext(node).split())))
+                if self.sectionlevel == self.top_sectionlevel:
+                    self.body.append(r'\sphinxstepscope' + CR)

                 try:
                     self.body.append(r'\%s%s{' % (self.sectionnames[self.sectionlevel], short))

@tk0miya As you see above, the fix can be obtained by inserting \sphinxstepscope suitably. Here I do it at top level sectioning, I am not sure if same as inserting for each document included in a toctree. Maybe the \sphinxstepscope should be inserted per input file (but currently I don't know how one does that, because currently I have only parcellar understanding of whole of our codebase, acting only a LaTeX maintenance technician...). What do you think?

I tested using this one.rst in place of OP's, and similar two.rst

One
----

========== ===
Test [#a]_
========== ===
   1       One
========== ===

.. [#a] Footnote one.

sous-section
^^^^^^^^^^^^

========== ===
Test [#b]_
========== ===
   1b       Oneb
========== ===

.. [#b] Footnote oneb.
Jellby commented 2 years ago

I fixed it myself by monkey-patching visit_start_of_file to issue \sphinxstepscope.

See: https://gitlab.com/Molcas/OpenMolcas/-/blob/db84fd20654312ff9d224697916b0b6ab526bafc/doc/extensions/patch.py

Jellby commented 2 years ago

And for the page detection, see https://gitlab.com/Molcas/OpenMolcas/-/blob/db84fd20654312ff9d224697916b0b6ab526bafc/doc/source/conf.py#L454 (\thepage cannot be trusted).

jfbu commented 2 years ago

I fixed it myself by monkey-patching visit_start_of_file to issue \sphinxstepscope. @Jellby Thanks!

Let's wait for @tk0miya opinion if issuing \sphinxstepscope at visit_start_of_file is the better way indeed (I had overlooked it in latex.py...). The two approaches can be merged, it does not cause harm to issue \sphinxstepscope.

And for the page detection, see https://gitlab.com/Molcas/OpenMolcas/-/blob/db84fd20654312ff9d224697916b0b6ab526bafc/doc/source/conf.py#L454 (\thepage cannot be trusted).

I will look into it. The LaTeX shipout routines assigns a correct value to \thepage however I guess interaction with inserts can result in problems when dealing with footnotes. Did you encounter a specific problematic instance?

Jellby commented 2 years ago

I will look into it. The LaTeX shipout routines assigns a correct value to \thepage however I guess interaction with inserts can result in problems when dealing with footnotes. Did you encounter a specific problematic instance?

The sample above is a problematic case. For me, there are two footnote marks in page 4, one says 1 and the other says Page 3, 1, but they are both links to the same footnote in page 3. The first one is wrong because it "believes" it is appearing on page 3 (the value of \thepage). See https://texfaq.org/FAQ-wrongpn for why \thepage cannot be trusted.

jfbu commented 2 years ago

@Jellby thanks can you raise separate issue about \thepage, I will make PR. The extra structure to fix this could then serve for adding backreferences from footnotes in page footers to footnote marks, but currently only for those footnotes using the \sphinxfootnotemark mechanism.

Jellby commented 2 years ago

I can raise a new issue, but note that the problem with \thepage is already in the OP's 2nd point.

jfbu commented 2 years ago

Allright, let's keep one issue. But there is one aspect whose resolution I will prefer to delay to later which is about colliding footnote marks originating in named or explicit mark-up in sources spread across files, a problem which is made worse from adding latex_theme = "howto" to your example. Achieving correct hyperlinks appears to me worthy enough first goal.

Also, I pointed out your example also has sub-optimal outcome with the singlehtml builder. So perhaps let's work out first a latex patch addressing at least some of the implied issues and then examine what is left later on and possibly open up new issues then.

jfbu commented 2 years ago

The whole \sphinxstepexplicit from #8832 has some core flaw anyhow. A fix to this issue will have to be incorporated as one aspect of the renewed thinking needed to fix #10188. This will probably entail changes both in latex.py and in sphinxpackagefootnote.sty latex code.

tk0miya commented 2 years ago

I fixed it myself by monkey-patching visit_start_of_file to issue \sphinxstepscope.

Looks good. The start_of_file nodes are visited before each item in toctree. So it's a good timing to increase the scope counter.

jfbu commented 2 years ago

I fixed it myself by monkey-patching visit_start_of_file to issue \sphinxstepscope.

Looks good. The start_of_file nodes are visited before each item in toctree. So it's a good timing to increase the scope counter.

@tk0miya Ok, I will incorporate this to fix of #10188. I am expecting to drop entirely the business with sphinxexplicit. And this will go with dropping the recently added referred attribute, which will not be needed.

But there will be a change in our Sphinx LaTeX mark-up for \sphinxfootnotemark, i.e. sphinxpackagefootnote.sty will be modified to diverge from current behaviour inherited from LaTeX original \footnotemark. Ours will get as additional argument the refid. This is needed to be able to robustly construct working hyperlinks. And in so doing I will add structure to the sphinxpackagefootnote.sty footnotetext (and footnote) environments which will allow to robustly know if footnote mark and footnote are on same or diverging pages.

I started looking into this and I observe that footnotetext node have lostbackrefs attribute. Can you give me a pointer where I should modify the footnote transform to let footnotetext modes inherit backrefs attribute to the various footnotemark's referring it ? I don't need theses backrefs immediately I think, but in future it will be nice to implement them, as what I need to add for fixing current issues opens up possibility with little effort in future to support back references like currently done for html output.

The sphinxscope, with modification at each input file, is a simple numerical alternative to work with docname attributes which require processing to provide safe latex labels ; besides I see that footnotemark and footnotetext nodes have lost docname. WIth sphinxscope we can do without it.

tk0miya commented 2 years ago

I just posted #10190 to fix this. Please reject it if you'll post another work.

I started looking into this and I observe that footnotetext node have lostbackrefs attribute. Can you give me a pointer where I should modify the footnote transform to let footnotetext modes inherit backrefs attribute to the various footnotemark's referring it ?

You can do it by modifying FootnoteCollector in sphinx/builders/latex/transforms.py. It generates footnottext nodes from the original footnote_definition nodes (see unrestrict() and depart_table()).

The sphinxscope, with modification at each input file, is a simple numerical alternative to work with docname attributes which require processing to provide safe latex labels ; besides I see that footnotemark and footnotetext nodes have lost docname. WIth sphinxscope we can do without it.

I think putting \sphinxscope per each input file is the first step. It would be better to change the location of the scope to each file, each part, or each arbitrary sectioning unit (via some new configuration). I'm not sure the future work is related to the docname. But it's better to consider the plan, if possible.